why_not_qdrant
Why Not Qdrant (Vector Database)
Evaluated March 2026. Test: 77 markdown files from docs/dev/, 762 chunks, multilingual-e5-base (768d), 8 search queries.
Test Results
| Metric | Qdrant HNSW | Brute-Force (current) |
|---|---|---|
| Search latency | 2-3ms | 30-35ms |
| Top-1 accuracy vs brute-force | 100% (identical) | baseline |
| Top-5 overlap | 8/8 queries: 5/5 | baseline |
| INT8 quantization quality loss | none (5/5 overlap) | n/a |
| Garbage query score ("xyzzynonexistent") | 0.827 | 0.827 |
| Best real query score | 0.890 | 0.890 |
Results are byte-for-byte identical at this scale. HNSW approximation doesn't kick in meaningfully below ~10K vectors.
Why We Stay With In-Memory Brute-Force
-
Scale doesn't justify it. A single site has hundreds to low thousands of notes. At 5K chunks, brute-force takes ~50-100ms — imperceptible in search UX.
-
Zero quality difference. The test showed 100% identical rankings across all query types (technical, bilingual, vague). Qdrant solves a speed problem we don't have.
-
The real problem is the model, not storage. E5 cosine scores are compressed into 0.75-1.0 range. Garbage queries score 0.82, good matches score 0.87. Qdrant doesn't fix this — it's a model/metric issue. Improvements that actually help:
- Cross-encoder reranking (second pass)
- Better score normalization before RRF fusion
- Hybrid search (BM25 naturally rejects garbage — already implemented)
-
Architectural simplicity. Go + SQLite monolith, single binary. Adding Qdrant means: Docker container, data sync pipeline (SQLite → Qdrant), health monitoring, version upgrades, backup coordination. Each moving part is a failure point.
-
Vectors are already in memory. Embeddings are loaded at startup for similar-notes features. Brute-force search reuses the same data — no extra memory or sync cost.
-
No filtering needs (yet). Qdrant's payload filtering (search within a tag/category) is powerful, but we filter by access permissions in Go after scoring. This works fine at current scale.
When To Reconsider
| Trigger | Why |
|---|---|
| >10K chunks in a single searchable index | HNSW latency advantage becomes real (sub-ms vs 500ms+) |
| Multi-tenant shared index | Combined vectors across sites could reach 50K+ |
| Need for filtered vector search | Qdrant's native payload filters are more efficient than post-filtering in Go at scale |
| Memory pressure | Offloading vectors to Qdrant frees Go process RAM (~6MB per 1K notes, becomes significant at 50K+) |
| Need for vector-level deduplication or clustering | Qdrant has built-in grouping and recommendation APIs |
What To Improve Instead
- Score normalization. Normalize E5 cosine scores to a meaningful 0-1 range before RRF fusion. The raw 0.75-1.0 compression makes threshold-based filtering unreliable.
- Post-RRF minimum score. Apply a relevance threshold after merging BM25 + vector results, not before. BM25 returning zero results for garbage queries naturally pushes junk down.
- Model evaluation. Test
multilingual-e5-large(1024d) orbge-m3for better score separation between relevant and irrelevant results. - Reranker. Cross-encoder reranking of top-20 candidates is expensive but dramatically improves precision. Could run as a second pass in the embedding-server.
Test Setup
Qdrant: qdrant/qdrant:latest (Docker)
Embedding: intfloat/multilingual-e5-base via embedding-server (sentence-transformers)
Chunking: paragraph-level, ~1500 char target, 200 char overlap
Collection: HNSW m=16, ef_construct=100, cosine distance
Script: scripts/qdrant_test.py