Token-economy baseline: naive grep approach vs MCP retrieval
Date: 2026-06-22
Methodology
For each question, we simulated what a coding agent without MCP does:
- Run
grep -rl <terms> docs/ --include="*.md"to locate candidate files. - Open the primary file(s) a reasonable agent would pick from those results. Where
docs/en/user/contained a dedicated user doc ANDdocs/dev/had a dev doc on the
same topic, we counted both — an agent cannot tell from the filename alone which one
holds the answer, so it reads both. - Token cost = grep output tokens + all file tokens read before a confident answer.
- Tool call count = every grep call + every file read until confident answer.
Tokenizer (same definition used in MCP benchmark for comparability):
python3 -c "import re,sys; print(len(re.findall(r'\w+|[^\w\s]', open(sys.argv[1], encoding='utf-8').read())))" <file>
# piped content:
echo "$TEXT" | python3 -c "import re,sys; print(len(re.findall(r'\w+|[^\w\s]', sys.stdin.read())))"
Results
| # | Question | Files opened | Tokens: grep | Tokens: files | Total tokens | Tool calls | Confident? |
|---|---|---|---|---|---|---|---|
| 1 | how do webhooks work | dev/webhooks.md, en/user/webhooks.md |
992 | 1 289 + 1 823 | 4 104 | 3 | yes |
| 2 | how do i publish a post to telegram | en/user/telegram.md, en/user/publishing.md |
1 400 | 1 632 + 1 039 | 4 071 | 3 | yes |
| 3 | set up a custom domain for my site | dev/multidomain.md, en/user/multidomains.md |
1 166 | 2 694 + 1 856 | 5 716 | 3 | yes |
| 4 | how to use multiple languages on my site | dev/multilang.md, en/user/multilingual.md |
652 | 3 178 + 1 271 | 5 101 | 3 | yes |
| 5 | two way sync between obsidian and the site | dev/obsidian_sync.md, en/user/two-way-sync.md |
980 | 6 144 + 688 | 7 812 | 3 | yes |
| 6 | what templates are available | en/user/templates.md, en/user/default-template.md |
2 594 | 2 751 + 4 449 | 9 794 | 3 | yes |
| 7 | accept paid subscriptions and monetization | en/user/monetization.md |
1 583 | 525 | 2 108 | 2 | yes |
| 8 | telegram post types and limits | en/user/telegram.md, dev/telegram_bot_vs_userbot.md |
585 | 1 632 + 1 309 | 3 526 | 3 | yes |
Sorted totals: 2 108 · 3 526 · 4 071 · 4 104 · 5 101 · 5 716 · 7 812 · 9 794
Median tokens-to-answer: 4 603
Median tool calls: 3 (Q7 needed only 2; all others needed 1 grep + 2 file reads)
Comparison vs MCP retrieval
| Method | Median tokens to answer | Median tool calls |
|---|---|---|
| MCP focused-section retrieval | ~200 | 2 (search + note_html) |
| MCP whole-note retrieval | ~2 700 | 2 (search + note_html) |
| Naive grep + file read (this benchmark) | ~4 600 | 3 |
The grep approach costs roughly 23× more tokens than the MCP focused-section read
(4 603 / 200), and about 1.7× more than reading a whole note via MCP.
Tool call count is almost the same — 3 vs 2. The difference is not in round-trips but in
what each call returns: the MCP grep returns a ranked, focused result (one section from
one note); the file-system grep returns a list of 20–30 filenames with no content, forcing
full-file reads to proceed. The overhead is all token volume, not call count.
Caveats
- EN + RU duplication inflates grep output.
docs/en/user/anddocs/ru/user/mirror
the same files. Every topic grep returns both language trees, roughly doubling the list
the agent must scan. We counted EN files only when the EN file was clearly primary, but
the noise still shows up in the grep token cost. - Whole-file reads dominate. Grep is cheap (500–2 500 tokens). The dominant cost is
that a naive agent reads full files — it does not know which section holds the answer, so
it consumes the entire file.dev/obsidian_sync.mdalone is 6 144 tokens. - Not a worst case. For Q6 (templates) we counted only 2 user-doc files. An agent that
also opensdev/default_template.md(4 348 tokens) ordev/layouts.md(2 731 tokens)
would spend 16 000+ tokens on that question alone. - Q7 is the cheap outlier.
en/user/monetization.mdis 525 tokens and squarely
answers the question; a well-named file makes grep effective for this case. It also needed
only 2 tool calls (1 grep + 1 read) rather than the usual 3. - Tool call counts look deceptively close to MCP (3 vs 2). The key difference is signal
per call: grep returns filenames only, so all three calls are required just to get to
readable content. MCP returns a focused answer in the second call. - These are realistic, not inflated numbers. The 23× token ratio is what a real agent
paying full attention to the grep results would spend.