search

Search Architecture

Trip2g uses hybrid search: full-text search (FTS) via Bleve combined with vector semantic search via OpenAI embeddings, merged with Reciprocal Rank Fusion (RRF).

Components

Full-Text Search (Bleve)

In-memory index rebuilt on every vault reload
Russian morphology analyzer (ru) for stemming
AND operator — all query terms must appear in the document
Indexes: note title + extracted plain text (frontmatter stripped, code blocks included)
Result: highlighted title and body snippets

Entry point: internal/noteloader/search.go — buildSearchIndex, Search.

Vector Search

Embeddings generated by OpenAI (text-embedding-3-small, 1536 dims) via background job
Stored in note_version_embeddings table (BLOB, float32 LE)
Loaded into model.NoteView.Embedding at vault load time via assignEmbeddings
At query time: query text → OpenAI embedding → cosine similarity against all notes with embeddings
Threshold: vectorMinSimilarity = 0.40 (notes below threshold are excluded)
Top 30 candidates passed to RRF merger

Entry point: internal/case/sitesearch/resolve.go — vectorSearch.

Hybrid Merge (RRF)

Reciprocal Rank Fusion combines FTS and vector results:

score(doc) = Σ 1 / (k + rank)   for each list where doc appears
k = 60 (standard RRF constant)

Rank-based, not score-based — avoids normalization issues between BM25 and cosine
Vector-only results get a generated text snippet (first 150 chars, frontmatter skipped)
Final list capped at 20 results

Feature Flag

Vector search requires:

features.VectorSearch.Enabled = true (config)
OpenAI API key configured

Without these, only FTS runs. FTS always runs regardless.

Embedding Generation

Background job generatenoteversionembedding runs when a note version is created:

Fetches note from in-memory cache
Computes sha256(title + raw_content) — skips if embedding already up to date
Sends title + "\n\n" + raw_content to OpenAI (includes frontmatter YAML)
Stores float32 LE bytes + model_id + content_hash in note_version_embeddings

Re-generation job: internal/case/cronjob/regeneratenoteembeddings.

Search Exclusion

Notes can be excluded from search index:

---
search: false
---

System notes (path starts with _ or contains /_) are also excluded from FTS indexing.
Exclusion is applied during buildSearchIndex; vector search currently does not apply this filter at query time.

Testing

Manual golden set

To verify search quality, run the server and check these query → expected note pairs:

Query	Expected note
как учёные ищут планеты у других звёзд	`demo/search_astronomy.md`
ключевые слова теги	`demo/search_keywords.md`

Calibrating the similarity threshold

Add temporary debug logging in vectorSearch to see real similarity values:

env.Logger().Debug("similarity", "path", note.Path, "sim", similarity)

Run a few representative queries, collect the similarity distribution:

Relevant notes: typical range 0.40–0.60 for Russian question→document with text-embedding-3-small
Noise: typically < 0.35

Set vectorMinSimilarity at the gap between the two groups.

Checking embedding freshness

-- Notes where latest version has no embedding
SELECT np.value, nv.id
FROM note_paths np
JOIN note_versions nv ON nv.path_id = np.id AND np.version_count = nv.version
LEFT JOIN note_version_embeddings nve ON nve.version_id = nv.id
WHERE nve.version_id IS NULL AND np.hidden_by IS NULL;

-- Embedding model distribution
SELECT model_id, COUNT(*) as cnt, LENGTH(embedding)/4 as dims
FROM note_version_embeddings
GROUP BY model_id, LENGTH(embedding)/4;