Token-economy baseline: naive grep approach vs MCP retrieval

Date: 2026-06-22

Methodology

For each question, we simulated what a coding agent without MCP does:

Run grep -rl <terms> docs/ --include="*.md" to locate candidate files.
Open the primary file(s) a reasonable agent would pick from those results. Where
docs/en/user/ contained a dedicated user doc AND docs/dev/ had a dev doc on the
same topic, we counted both — an agent cannot tell from the filename alone which one
holds the answer, so it reads both.
Token cost = grep output tokens + all file tokens read before a confident answer.
Tool call count = every grep call + every file read until confident answer.

Tokenizer (same definition used in MCP benchmark for comparability):

python3 -c "import re,sys; print(len(re.findall(r'\w+|[^\w\s]', open(sys.argv[1], encoding='utf-8').read())))" <file>
# piped content:
echo "$TEXT" | python3 -c "import re,sys; print(len(re.findall(r'\w+|[^\w\s]', sys.stdin.read())))"

Results

#	Question	Files opened	Tokens: grep	Tokens: files	Total tokens	Tool calls	Confident?
1	how do webhooks work	`dev/webhooks.md`, `en/user/webhooks.md`	992	1 289 + 1 823	4 104	3	yes
2	how do i publish a post to telegram	`en/user/telegram.md`, `en/user/publishing.md`	1 400	1 632 + 1 039	4 071	3	yes
3	set up a custom domain for my site	`dev/multidomain.md`, `en/user/multidomains.md`	1 166	2 694 + 1 856	5 716	3	yes
4	how to use multiple languages on my site	`dev/multilang.md`, `en/user/multilingual.md`	652	3 178 + 1 271	5 101	3	yes
5	two way sync between obsidian and the site	`dev/obsidian_sync.md`, `en/user/two-way-sync.md`	980	6 144 + 688	7 812	3	yes
6	what templates are available	`en/user/templates.md`, `en/user/default-template.md`	2 594	2 751 + 4 449	9 794	3	yes
7	accept paid subscriptions and monetization	`en/user/monetization.md`	1 583	525	2 108	2	yes
8	telegram post types and limits	`en/user/telegram.md`, `dev/telegram_bot_vs_userbot.md`	585	1 632 + 1 309	3 526	3	yes

Sorted totals: 2 108 · 3 526 · 4 071 · 4 104 · 5 101 · 5 716 · 7 812 · 9 794

Median tokens-to-answer: 4 603

Median tool calls: 3 (Q7 needed only 2; all others needed 1 grep + 2 file reads)

Comparison vs MCP retrieval

Method	Median tokens to answer	Median tool calls
MCP focused-section retrieval	~200	2 (search + note_html)
MCP whole-note retrieval	~2 700	2 (search + note_html)
Naive grep + file read (this benchmark)	~4 600	3

The grep approach costs roughly 23× more tokens than the MCP focused-section read
(4 603 / 200), and about 1.7× more than reading a whole note via MCP.

Tool call count is almost the same — 3 vs 2. The difference is not in round-trips but in
what each call returns: the MCP grep returns a ranked, focused result (one section from
one note); the file-system grep returns a list of 20–30 filenames with no content, forcing
full-file reads to proceed. The overhead is all token volume, not call count.

Caveats

EN + RU duplication inflates grep output. docs/en/user/ and docs/ru/user/ mirror
the same files. Every topic grep returns both language trees, roughly doubling the list
the agent must scan. We counted EN files only when the EN file was clearly primary, but
the noise still shows up in the grep token cost.
Whole-file reads dominate. Grep is cheap (500–2 500 tokens). The dominant cost is
that a naive agent reads full files — it does not know which section holds the answer, so
it consumes the entire file. dev/obsidian_sync.md alone is 6 144 tokens.
Not a worst case. For Q6 (templates) we counted only 2 user-doc files. An agent that
also opens dev/default_template.md (4 348 tokens) or dev/layouts.md (2 731 tokens)
would spend 16 000+ tokens on that question alone.
Q7 is the cheap outlier. en/user/monetization.md is 525 tokens and squarely
answers the question; a well-named file makes grep effective for this case. It also needed
only 2 tool calls (1 grep + 1 read) rather than the usual 3.
Tool call counts look deceptively close to MCP (3 vs 2). The key difference is signal
per call: grep returns filenames only, so all three calls are required just to get to
readable content. MCP returns a focused answer in the second call.
These are realistic, not inflated numbers. The 23× token ratio is what a real agent
paying full attention to the grep results would spend.