Search refactoring — next steps (handoff)
This is a handoff for the next agent continuing the vector-search work. Read it together with docs/dev/search_refactoring.md (the running report with measured before/after numbers) and docs/dev/retrieval_eval.md (how the benchmark works).
What is already done (committed on branch feat/vector-search-benchmark)
- Benchmark harness —
internal/retrievaleval/(metrics, golden-set loader, GraphQL search client, report), CLIcmd/evalretrieval/, isolated stackdocker-compose.vecbench.yml+scripts/vecbench.sh, two golden sets (testdata/eval/golden_set.json= 60 short-corpus queries,testdata/eval/golden_set_longdocs.json= 16 long-doc queries). - F1 widen RRF fusion pool (
vectorTopK5→50, mcpDefaultVectorSearchLimit10→50) — recall 0.983→1.0. Shipped. - F2 per-language bleve analyzer (en + ru, dual-field + disjunction query) — correctness fix, ~0 metric delta on this corpus. Shipped.
- F3 cross-encoder reranker (
internal/reranker/,reranker-server/) — measured worse (promotes near-neighbour distractors). Shipped OFF by default behindvector_search.reranker.enabled. - F4 chunk heading breadcrumb (
{title} > {h1} > {h2}\n\n{body}) + token-aware sizing — long-doc nDCG@10 0.9308→0.9539 (en→en +0.09). Shipped.
Baselines to beat: short corpus nDCG@10 0.9263, long-doc nDCG@10 0.9539 (both reranker-off, post-F4).
How to run the benchmark (important gotchas)
./scripts/vecbench.sh up # build + fresh stack + push vault + embed + reload
./scripts/vecbench.sh rebuild # after a Go CODE change (no re-embed)
./scripts/vecbench.sh sync # after a VAULT change (re-push + embed + reload)
./scripts/vecbench.sh eval <label> <out.json> # run golden_set.json (short) as admin
./scripts/vecbench.sh down
Endpoint: http://localhost:21081/_system/graphql (the /graphql alias is deprecated — always use /_system/graphql). Eval must run as admin (anonymous search returns only live notes; the vault is draft/latest) — vecbench.sh eval handles the dev sign-in (code 111111) and bearer token. To eval the long-doc set, run the CLI directly with -golden testdata/eval/golden_set_longdocs.json and a bearer token.
Critical gotchas (these cost hours):
- Chunker changes don't re-embed on
sync. The embedding job dedups on the note content hash; changing chunker CODE doesn't change note content, so nothing re-embeds. To force it: wipe the DB (rm tmp/vecbench-data/vecbench.sqlite3*) andup. - Stale
.sync-state.json. obsidian-sync records what it already pushed; against a wiped DB it pushes nothing.cmd_upnowrmstestdata/vecbench/vault/.sync-state.jsonbefore pushing — keep that. - In-memory chunk cache is stale until reload. Embeddings are generated by async jobs after sync; the app must reload to see them.
vecbench.shdrains jobs (/debug/wait_all_jobs) then restarts the app — keep that flow. - Concurrent edits: another session has uncommitted changes to
cmd/server/main.go,internal/appconfig/config.go, anddocs/marketing/*. Do NOT touch those files, and commit only your own files with explicit paths.
Remaining work items
1. F5 — AND→OR fallback + cosine norm precompute (effort S, quality ~0 / perf win)
- Files:
internal/noteloader/search.go(Search, theMatchQueryOperatorAndat ~line 146),internal/case/sitesearch/resolve.go(vectorSearch~120,cosineSimilarity~335),internal/model/chunk.go(addNorm float32toNoteChunk),internal/noteloader/loader.go(set norm at load). - Do: if the AND bleve query returns too few hits, re-run with
MatchQueryOperatorOrand merge. Precompute each chunk's L2 norm at load (the embedding server returns normalized vectors — verify withsentence-transformers normalize_embeddings=Trueinembedding-server/server.py; if already unit-norm, switch cosine to a plain dot product). Add a Warn log oflen(chunks)+ scan duration invectorSearch. - Expect: ~0 quality delta on this corpus (recall already maxed), latency win. Measure on both golden sets, record in
search_refactoring.md, commitperf(search): AND->OR fallback + precomputed norms (F5).
2. Option A — graphql_request returns MCP structuredContent (effort S)
The capability the user wants ("pass a GraphQL selection, get exactly that JSON") already exists as the
graphql_requestMCP tool. gqlgen is the projector (returns only selected fields) viaenv.GraphQLRequest(cmd/server/main.go:3013). The only gap is the return shape.
- File:
internal/case/mcp/graphql_tools.go,handleGraphQLRequest(lines 53–68). - Do: after
result, err := env.GraphQLRequest(...),json.Unmarshal(result, &parsed)and returnstructuredToolResult(stub, parsed)(helper atresolve.go:90). Pass the full{data,errors}envelope as structuredContent so GraphQL errors aren't hidden. - ⚠ TOKEN DUPLICATION (the user flagged this):
structuredToolResultcurrently sets BOTHContenttext ANDStructuredContentto the payload → the model sees the JSON twice (~2× tokens). Do NOT pass the full JSON as the text arg. Instead pass a short stub as text (e.g."structured result"or an empty string) and the parsed JSON as structuredContent — so the payload appears once. The whole point (fuzzy pointer + projection) is token economy; don't undo it. Verify which the trip2g MCP client actually feeds to the model and pick the single representation accordingly. - Test:
internal/case/mcp/graphql_tools_test.go— assertStructuredContentis set and the text block is the stub, not a second copy. - Commit:
feat(mcp): graphql_request returns structuredContent (single-copy).
3. Query-only + root-field whitelist guard on graphql_request (effort S, the real deliverable)
env.GraphQLRequestforges an admin token (appreq.WithAdminToken,cmd/server/main.go:3015), so an arbitrary GraphQL string can run mutations (createAdmin, banUser, delete notes) and read admin-only data. It's gated behind the per-keyenableMcpAdminToolsflag today, but before any agent loop uses it, add guards.
- File:
internal/case/mcp/graphql_tools.go(validate before callingenv.GraphQLRequest). Do the validation in the mcp package (do NOT modifymain.go— concurrent edits). - Do: parse the query with
github.com/vektah/gqlparser/v2/parserParseQuery(no schema needed). Reject if anyOperationDefinition.Operation != Query(block Mutation/Subscription, incl. multi-operation smuggling). Whitelist top-level selection field names to a read set:note, search, similarNotes, viewer, notePaths, resolveWikilinks(confirm exact root field names ininternal/graph/schema.graphqls); rejectadminand anything else. Log every call (operation string + key owner) for audit. - Test: reject a mutation, reject
{ admin { ... } }, allow{ note(...) { title } }. - Commit:
feat(mcp): query-only + root-field allowlist for graphql_request. - Stretch (separate, larger — do NOT bundle): run the query in the caller's context instead of admin so
CanReadNote/subgraph gates apply. This is a broad change (the whole MCP read model equates API key with admin,resolve.go:485); leave as a follow-up.
4. Harden tocPathForSnippet — make the fuzzy pointer reliable (effort S–M)
This is the linchpin of the "fuzzy → exact section" chain. It currently substring-matches the search snippet against rendered HTML sections and can silently return
nil, dropping the breadcrumb.
- File:
internal/case/mcp/toc_path.go(tocPathForSnippet~57,findDeepestSection/matchHeaderDiv~79). - Do:
- (a) Normalize both sides with ONE shared function before the
strings.Containsmatch. This is the core fix — today the two sides drift:- The asymmetry to remove: the snippet side (
toc_path.go:62) extracts text withhtmlPlainText(markedContext(snippet)), but the section side (matchHeaderDiv,toc_path.go:89) useshtmlNodeText(n). Two different extractors over two different HTML pipelines (bleve highlight fragment vs goldmark-rendered note HTML) → the same words can serialize differently, soContainsfails. - Write one
normalizeForMatch(s string) stringand call it on BOTH the snippet target and each section's text. It must, in this order:- extract plain text the same way for both (pick one path — e.g. parse with
golang.org/x/net/htmland concat text nodes — don't usehtmlPlainTexton one side andhtmlNodeTexton the other); html.UnescapeStringto decode entities (&→&, →U+00A0,'→') — the bleve fragment and goldmark HTML escape differently, so decode on both sides;- replace non-breaking / zero-width / soft-hyphen runes (U+00A0, U+200B, U+00AD, U+FEFF) with a normal space (or drop the zero-width ones) —
strings.Fieldstreats U+00A0 as space but not U+200B/U+00AD; - Unicode-normalize to NFC (
golang.org/x/text/unicode/norm) so composed vs decomposed Cyrillic/Latin diacritics match; strings.ToLower;- collapse whitespace via
strings.Join(strings.Fields(s), " ").
- extract plain text the same way for both (pick one path — e.g. parse with
- Optionally also fold typographic punctuation (smart quotes
“”’→"', en/em dash →-) since goldmark may smart-quote while bleve keeps ASCII. Keep this conservative — only quotes/dashes, don't strip all punctuation (that over-matches). - Replace line 62 with
target := normalizeForMatch(htmlOf(markedContext(snippet)))and line 89 withsectionText := normalizeForMatch(innerHTMLOf(n)), both feeding the same extractor+normalizer. Add a unit test that the same logical text rendered by the two pipelines normalizes to byte-identical strings.
- The asymmetry to remove: the snippet side (
- (b) Add a robust fallback using F4: the chunk content now starts with the breadcrumb
{title} > {section} > {subsection}. Thematch_id(p{pid}:c{chunk}) identifies the chunk; read that chunk's breadcrumb and derive thetoc_pathdirectly — no fuzzy text match needed. This always yields a path. - (c) Never return
nil— fall back to the note root / first heading. - Invariant:
sectionHTMLByTocPath/navigateSectionPathmatchdata-headerby exact title string; the emitted path must use the same heading strings as thedata-headerattribute. Don't introduce normalization drift between emit and lookup.
- (a) Normalize both sides with ONE shared function before the
- Test: a snippet whose HTML reflows differently still resolves; an intro-before-first-heading match still gets a path; a chunk-index fallback path round-trips through
sectionHTMLByTocPath. - Commit:
fix(mcp): robust toc_path from snippet (normalize + chunk-breadcrumb fallback).
5. MCP tool descriptions — explain the breadcrumb + drill-down (effort S) — task #7
- File:
internal/case/mcp/resolve.go—searchdescription (line ~178),federated_searchdescription,note_htmldescription (~203). - Do: document the complementary drill-down so the consuming LLM uses it:
searchreturns a chunk snippet with a heading breadcrumb{title} > {section} > {subsection}— locates the approximate section;- the search result's TOC (path arrays) gives the note's precise structure;
note_html(toc_path=[…])reads the exact section.
Frame breadcrumb → TOC →toc_pathas fuzzy → structure → precise.
- Commit:
docs(mcp): describe breadcrumb + toc_path drill-down in tool descriptions.
6. Docs (delegate to the trip2g-docs agent — EN + RU passes)
docs/dev/vector_search.md(task #2) — it's stale (lists hybrid search as "future"). Rewrite to current reality: hybrid BM25+vector RRF k=60, per-chunk embeddings, brute-force cosine, bge-m3 default, per-language analyzer, optional reranker (off), and the F4 chunk format{title} > {h1} > {h2}\n\n{body}+ token sizing. Note the breadcrumb is non-obvious and appears in snippets. Verify against code.docs/{en,ru}/thoughts/…(task #6) — article on the bugs found + the benchmark + the fixes. Results first (the deltas table fromsearch_refactoring.md), details below. Followdocs/CLAUDE.mdwriting rules (суть сначала, без канцелярита).docs/{en,ru}/user/Fuzzy Pointer.md(task #8) — user-facing concept note: vector search returns a deliberately imprecise pointer (breadcrumb) that resolves via the TOC to an exact section (note_htmltoc_path). Three-step drill-down; how AI agents use it over a published vault.
Decided AGAINST (do not build)
federated_graphql_request— arbitrary GraphQL across federation trust boundaries. Federation is deliberately narrow (scoped by inbound/outbound secrets + subgraph grants); letting peer A run arbitrary GraphQL on peer B (as admin) is a different, much worse risk class, plus schema-drift across peer versions and N-shape result merging. If structured cross-pool data is ever needed, add specific curated federated read fields, not arbitrary GraphQL.
Suggested order
F5 (finish the fix track) → Option A + query-only guard (small, high value, satisfies the GraphQL-projection ask safely) → harden tocPathForSnippet → MCP descriptions → docs last (so they describe the final state).