What is Milvus?
Milvus is an open-source vector database built specifically for storing and querying high-dimensional embeddings — the numerical representations that power semantic search, recommendation engines, and retrieval-augmented generation (RAG). Developed by Zilliz and donated to the LF AI & Data Foundation under Apache 2.0, it has grown into one of the most widely deployed vector databases in production, with over 10,000 production deployments and 40,000+ GitHub stars as of 2026. Unlike general-purpose databases with vector extensions bolted on, Milvus was designed from the ground up for approximate nearest neighbor (ANN) search, supporting all major index types including HNSW, DiskANN, and IVF alongside native BM25 full-text search. Milvus 2.6, actively releasing through early 2026, introduced automatic embedding precision conversion that cuts memory requirements by up to 50% without meaningful recall loss.
Key Takeaways
- Milvus handles tens of billions of vectors with horizontal scaling — the practical ceiling that pgvector and in-memory FAISS cannot reach.
- Self-hosted Milvus Cluster requires running five-plus services (etcd, MinIO, Pulsar, plus Milvus nodes) — operational overhead most teams underestimate.
- Zilliz Cloud's 2026 pricing overhaul cut storage costs 87% via tiered storage, making Milvus-based RAG viable for mid-market companies that couldn't previously justify the bill.
- Milvus supports hybrid dense + sparse vector search natively, eliminating the need for a separate keyword search system in RAG pipelines.
- Milvus expertise appears in AI engineering job postings alongside LangChain, OpenAI APIs, and Python — rarely as a standalone requirement.
What Makes Milvus Stand Out
Milvus's core strength is that it treats vector search as a first-class database operation rather than an extension. Where a database like PostgreSQL with pgvector must work around a row-oriented storage engine, Milvus was built from scratch around approximate nearest neighbor search — and that difference shows at scale.
The architecture separates storage, compute, and coordination into independent layers, each scalable independently. DiskANN support means you can run billion-scale indexes without loading everything into RAM — the alternative to throwing 256GB+ memory machines at the problem. Hybrid search combines dense vector similarity with BM25 sparse keyword search in a single query, which produces meaningfully better recall for RAG pipelines than either technique alone. Multiple vector fields per collection let you store both text and image embeddings in one record and rank results across both simultaneously — without joining across two separate databases.
Production Gotchas Teams Learn the Hard Way
Milvus's performance numbers are real — but they come with conditions that don't always appear in the getting-started docs.
The memory requirement is the first surprise. Headline latency figures assume the vector index fits in RAM. At billion-vector scale with 768-dimensional embeddings, that means machines with hundreds of gigabytes of memory. DiskANN is the escape hatch, but it trades latency for the ability to run on commodity hardware. The second surprise is filtered search degradation: adding scalar metadata filters to an ANN query can force Milvus to fall back to brute-force scanning depending on filter selectivity, causing latency spikes teams didn't anticipate in load testing.
Deletion is the third gotcha. Milvus marks vectors deleted but doesn't reclaim storage immediately — compaction must be triggered explicitly. Systems with high update rates (e.g., refreshing user embeddings) accumulate dead segments that affect both storage costs and query performance if compaction isn't scheduled. These are solvable problems, but they require production experience to anticipate.
Milvus vs Pinecone vs Weaviate vs pgvector
The right vector database depends on your scale, your team's infrastructure appetite, and whether you need self-hosting.
Pinecone is the fastest path to a working vector search — fully managed, no infrastructure, and a simple API. It costs more at high query volume and offers no self-hosted option, making it a poor fit for data-sovereignty requirements or billion-scale budgets. Milvus wins on raw throughput, index flexibility, and total cost at scale; Pinecone wins on time-to-first-query and operational simplicity.
Weaviate is open-source with built-in vectorization modules — you can skip calling an external embedding API and let Weaviate handle it. It's a better fit for multi-modal objects and schema-driven data models. Milvus generally outperforms Weaviate on benchmark throughput for pure vector similarity at massive scale.
pgvector is the right answer when your dataset is under ~10M vectors and you already run PostgreSQL. It avoids adding a new system entirely. When query latency starts degrading under concurrent load or dataset growth, Milvus is the natural upgrade path.
Pricing
Milvus is open-source and free to self-host — the cost is infrastructure and the engineering overhead to operate it. Zilliz Cloud, the managed service, offers a permanent Free Tier with 5GB storage and 2.5M vCUs per month — enough for prototypes and small production workloads. Serverless pricing charges $4 per million vCUs consumed. Dedicated clusters start at $99/month.
Starting January 1, 2026, Zilliz Cloud standardized storage at $0.04 per GB/month across AWS, Azure, and GCP and eliminated markup on data transfer fees. A new tiered storage architecture (announced October 2025) delivers an 87% reduction in storage costs for large datasets — a meaningful change for teams storing hundreds of millions of vectors. A Business Critical plan targeting regulated industries (finance, healthcare) adds enhanced security controls; pricing requires contacting sales.
Milvus in Fractional and Contract Hiring
Milvus expertise enters the hiring market at a specific inflection point: when a team's RAG or semantic search system moves from prototype to production and the simpler vector store (pgvector, Chroma, or an in-memory FAISS index) starts showing latency or memory limits. That transition — usually triggered by crossing the 10M–50M vector threshold — is where fractional ML engineers and data engineers with Milvus experience provide high leverage.
The skill almost never appears in isolation. Job postings pair Milvus with LangChain or LlamaIndex (orchestration), OpenAI or Hugging Face embedding models (encoding), and Python throughout. On Pangea, we see Milvus requests cluster around three engagement types: index architecture and schema design before a deployment, performance debugging after production latency spikes, and cost optimization work following unexpectedly large cloud bills. These are high-leverage, time-bounded engagements where a week with the right engineer prevents months of operational pain.
The Bottom Line
Milvus is the go-to vector database for teams that need production-grade similarity search at a scale pgvector can't reach. Its open-source foundation, flexible index options, and Kubernetes-native architecture make it the practical choice for billion-vector workloads — but self-hosting carries real operational complexity. For companies hiring through Pangea, Milvus expertise signals an AI engineer who has moved beyond tutorials into production RAG systems, one who understands the gap between benchmark performance and what actually happens when embeddings, filters, and scale interact.
