pvakati-tech.ai

Vector DBs for AI Search (2025)

Curated comparison for RAG, hybrid search, and agentic workloads. Pick a scenario to highlight recommendations, view a compact score matrix, and grab ready-to-run compose files.

Ship fast with low ops & great DX.

Top pickQdrant Recommended
Fit score: 88/100

Index Tuning Cheat-Sheet

HNSW

ANNin-memory

Knobs: M, efConstruction, efSearch

Start with: M=32 • efC=200 • ef=64

↑ef ⇒ ↑recall + ↑latency. Good filters; memory ↑ with M.

IVF-Flat

ANNcentroids

Knobs: nlist, nprobe

Start with: nlist≈√N • nprobe≈1–5% nlist

Good at large N; tune nprobe for recall/latency.

IVF-PQ / OPQ

compressioncost

Knobs: nlist, nprobe, m, bits

Start with: m | dim, 8 bits/subvector

Big memory savings; slight recall loss vs Flat.

Disk-ANN / tiered

SSDscale

Knobs: cache %, prefetch

Start with: warm cache for steady p95

SSD-backed; good for 100M+ with cost control.

Quick Sizing & Cost Estimator

Raw vectors: 7.15 GB
Index overhead: 10.73 GB
Total footprint: 17.88 GB

Rules of thumb only. Real footprints vary with M/ef, nlist/nprobe, PQ code size, metadata, and filtering structures.

Benchmark Methodology (template)

{
  "dataset": {
    "name": "internal-wiki",
    "N": 5000000,
    "dim": 768,
    "lang": "en"
  },
  "query": {
    "k": 20,
    "filters": {
      "team": [
        "eng",
        "support"
      ]
    },
    "hybrid": {
      "alpha": 0.6
    }
  },
  "latency": {
    "budget_ms_p95": 80,
    "target_qps": 300,
    "warm": true
  },
  "index": {
    "kind": "HNSW",
    "M": 32,
    "efConstruction": 200,
    "efSearch": 64
  },
  "writeMix": {
    "upserts_per_min": 200,
    "deletes_per_min": 10
  },
  "metrics": [
    "recall@20",
    "p50",
    "p95",
    "p99",
    "error_rate"
  ],
  "rerank": {
    "enabled": true,
    "topK": 50
  }
}

Use the same harness across systems; keep recall targets constant and compare p95/p99 under the same filters/hybrid/rerank settings.

Hybrid Search Recipes

Weighted fusion
score = α * cosine(vec(q), vec(doc)) + (1 - α) * bm25(q, doc)

Start α at 0.6; tune by validation recall.

Reciprocal Rank Fusion (RRF)
RRF(d) = Σ_s 1 / (k + rank_s(d))   # k≈60

Stable blend across multiple rankers (vector, keyword, metadata).

Qdrant RecommendedOpen Source

Rust-powered OSS, fast HNSW with filters

Open-source vector DB built for high RPS and low latency with robust filtering/payloads; managed cloud available.

Score matrixFit: 88/100
Recall
4/5
Latency
5/5
Cost
5/5
Ops
4/5
Hybrid
4/5

Pros

  • Great speed/recall trade-off
  • Simple filters/payloads
  • Cloud option

Cons

  • Fewer “platform” features
  • Backups/sharding planning
RAGHNSWHybridFiltersScale

Pinecone RecommendedManaged

Serverless, fully managed vector DB

Managed vector database with serverless scale, low-latency query paths, and strong ecosystem integrations for RAG/agents.

Score matrixFit: 85/100
Recall
4/5
Latency
5/5
Cost
3/5
Ops
5/5
Hybrid
4/5

Pros

  • Zero-ops scaling
  • Great integrations
  • Multi-tenant friendly

Cons

  • Proprietary & usage-priced
  • Less infra control
RAGHNSWHybridFiltersScale

Weaviate RecommendedOpen Source

AI-native OSS with hybrid search & filters

Open-source vector DB with first-class hybrid (dense+sparse) search, filters, and flexible schema. Cloud or self-hosted.

Score matrixFit: 79/100
Recall
4/5
Latency
4/5
Cost
4/5
Ops
3/5
Hybrid
5/5

Pros

  • Hybrid + filters
  • Cloud/on-prem
  • Mature tooling

Cons

  • More ops at large scale
  • Tuning depth/learning curve
RAGHNSWHybridFiltersScale

Milvus (Zilliz Cloud)Open Source

Scale-first OSS; managed by Zilliz

Vector database for very large corpora and high throughput. Zilliz Cloud provides a managed route to reduce ops.

Score matrixFit: 79/100
Recall
5/5
Latency
4/5
Cost
4/5
Ops
3/5
Hybrid
4/5

Pros

  • Massive scale
  • Active community + cloud
  • Flexible indexes

Cons

  • Heavier to operate on-prem
  • Overkill for small apps
RAGHNSWHybridFiltersScale

ElasticsearchSearch Engine

kNN in a battle-tested search engine

Dense vectors + kNN alongside keyword search and analytics. Strong hybrid when you already run Elastic.

Score matrixFit: 75/100
Recall
4/5
Latency
3/5
Cost
3/5
Ops
4/5
Hybrid
5/5

Pros

  • Keyword+vector in one stack
  • Mature ecosystem
  • Security/observability

Cons

  • Vector feature constraints
  • Pure ANN perf can trail DBs
RAGHNSWHybridFiltersScale

VespaSearch Engine

Serving engine with ANN, filters, and ranking pipelines

Production engine with HNSW ANN, advanced ranking pipelines, and strong filtering—great for large catalogs & complex scoring.

Score matrixFit: 73/100
Recall
5/5
Latency
4/5
Cost
3/5
Ops
2/5
Hybrid
5/5

Pros

  • Advanced ranking/hybrid
  • Large-scale serving patterns

Cons

  • Heavier to operate
  • Higher learning curve
RAGHNSWHybridFiltersScale

PostgreSQL + pgvectorDatabase Extension

Bring vector search to Postgres

Postgres extension adding IVF/HNSW vector indexes. Keep vectors next to relational data for simpler stacks.

Score matrixFit: 69/100
Recall
3/5
Latency
3/5
Cost
4/5
Ops
4/5
Hybrid
3/5

Pros

  • Single DB (SQL+vectors)
  • Great for moderate scale
  • Simple backups

Cons

  • Lower peak perf than specialists
  • Recall/latency hinge on tuning
RAGHNSWHybridFiltersScale

OpenSearchSearch Engine

Elastic fork with k-NN plugin

Vector search via k-NN plugin (HNSW/IVF). Solid open-source option—especially AWS-centric.

Score matrixFit: 68/100
Recall
3/5
Latency
3/5
Cost
4/5
Ops
3/5
Hybrid
4/5

Pros

  • AWS integration
  • Hybrid search story

Cons

  • Ops/tuning involved
  • Peak ANN behind specialists
RAGHNSWHybridFiltersScale

Quick docker-compose

Qdrant (minimal)

# Qdrant docker-compose (minimal)
version: "3.9"
services:
  qdrant:
    image: qdrant/qdrant:v1.11
    ports:
      - "6333:6333" # REST
      - "6334:6334" # gRPC
    volumes:
      - ./qdrant_storage:/qdrant/storage
    environment:
      QDRANT__SERVICE__GRPC_PORT: 6334
# start:  docker compose up -d

Milvus + deps (standalone)

# Milvus + dependencies (standalone) via docker compose
version: "3.9"
services:
  etcd:
    image: quay.io/coreos/etcd:v3.5.5
    environment:
      ETCD_AUTO_COMPACTION_MODE: revision
      ETCD_AUTO_COMPACTION_RETENTION: "1000"
      ETCD_QUOTA_BACKEND_BYTES: "4294967296"
    volumes:
      - ./volumes/etcd:/etcd
    command: >
      etcd -advertise-client-urls http://etcd:2379
           -listen-client-urls http://0.0.0.0:2379
  minio:
    image: minio/minio:RELEASE.2024-05-10T01-41-38Z
    environment:
      MINIO_ROOT_USER: minioadmin
      MINIO_ROOT_PASSWORD: minioadmin
    volumes:
      - ./volumes/minio:/data
    command: server /data --console-address ":9001"
  milvus:
    image: milvusdb/milvus:v2.4.4
    depends_on: [etcd, minio]
    ports:
      - "19530:19530"  # gRPC
      - "9091:9091"    # HTTP
    volumes:
      - ./volumes/milvus:/var/lib/milvus
# start:  docker compose up -d

Notes: adjust versions, mount persistent volumes, and lock ports. For production, add healthchecks, backups, metrics, and TLS.