search
Search Architecture
Trip2g uses hybrid search: full-text search (FTS) via Bleve combined with vector semantic search via OpenAI embeddings, merged with Reciprocal Rank Fusion (RRF).
Components
Full-Text Search (Bleve)
- In-memory index rebuilt on every vault reload
- Russian morphology analyzer (
ru) for stemming - AND operator — all query terms must appear in the document
- Indexes: note title + extracted plain text (frontmatter stripped, code blocks included)
- Result: highlighted title and body snippets
Entry point: internal/noteloader/search.go — buildSearchIndex, Search.
Vector Search
- Embeddings generated by OpenAI (
text-embedding-3-small, 1536 dims) via background job - Stored in
note_version_embeddingstable (BLOB, float32 LE) - Loaded into
model.NoteView.Embeddingat vault load time viaassignEmbeddings - At query time: query text → OpenAI embedding → cosine similarity against all notes with embeddings
- Threshold:
vectorMinSimilarity = 0.40(notes below threshold are excluded) - Top 30 candidates passed to RRF merger
Entry point: internal/case/sitesearch/resolve.go — vectorSearch.
Hybrid Merge (RRF)
Reciprocal Rank Fusion combines FTS and vector results:
score(doc) = Σ 1 / (k + rank) for each list where doc appears
k = 60 (standard RRF constant)
- Rank-based, not score-based — avoids normalization issues between BM25 and cosine
- Vector-only results get a generated text snippet (first 150 chars, frontmatter skipped)
- Final list capped at 20 results
Feature Flag
Vector search requires:
features.VectorSearch.Enabled = true(config)- OpenAI API key configured
Without these, only FTS runs. FTS always runs regardless.
Embedding Generation
Background job generatenoteversionembedding runs when a note version is created:
- Fetches note from in-memory cache
- Computes
sha256(title + raw_content)— skips if embedding already up to date - Sends
title + "\n\n" + raw_contentto OpenAI (includes frontmatter YAML) - Stores float32 LE bytes + model_id + content_hash in
note_version_embeddings
Re-generation job: internal/case/cronjob/regeneratenoteembeddings.
Search Exclusion
Notes can be excluded from search index:
---
search: false
---
System notes (path starts with _ or contains /_) are also excluded from FTS indexing.
Exclusion is applied during buildSearchIndex; vector search currently does not apply this filter at query time.
Testing
Manual golden set
To verify search quality, run the server and check these query → expected note pairs:
| Query | Expected note |
|---|---|
| как учёные ищут планеты у других звёзд | demo/search_astronomy.md |
| ключевые слова теги | demo/search_keywords.md |
Calibrating the similarity threshold
Add temporary debug logging in vectorSearch to see real similarity values:
env.Logger().Debug("similarity", "path", note.Path, "sim", similarity)
Run a few representative queries, collect the similarity distribution:
- Relevant notes: typical range
0.40–0.60for Russian question→document withtext-embedding-3-small - Noise: typically
< 0.35
Set vectorMinSimilarity at the gap between the two groups.
Checking embedding freshness
-- Notes where latest version has no embedding
SELECT np.value, nv.id
FROM note_paths np
JOIN note_versions nv ON nv.path_id = np.id AND np.version_count = nv.version
LEFT JOIN note_version_embeddings nve ON nve.version_id = nv.id
WHERE nve.version_id IS NULL AND np.hidden_by IS NULL;
-- Embedding model distribution
SELECT model_id, COUNT(*) as cnt, LENGTH(embedding)/4 as dims
FROM note_version_embeddings
GROUP BY model_id, LENGTH(embedding)/4;