Federation Telegram Adapter Plan

Goal

Build federation_telegram_adapter: a small MCP-speaking service that indexes configured Telegram channels, groups, and private dialogs into Qdrant, then exposes them to a trip2g federation hub as a leaf knowledge base.

The adapter is for the demo path where people do not maintain Obsidian notes. They keep writing in Telegram; the adapter turns those dialogs into searchable operational memory that the hub can query through federated_search.

Relationship To MCP Federation

The active federation contract is in:

  • docs/dev/mcp_federation.md
  • docs/plans/2026-04-27-mcp-federation.md
  • docs/superpowers/plans/2026-04-27-mcp-federation.md

The adapter behaves like a remote leaf base:

  • Hub KB-note points to https://adapter.example/_system/mcp.
  • Adapter accepts one scoped capability token on either Authorization: Bearer <token> or ?token=....
  • Adapter maps the token to allowed Telegram dialog scopes. These scopes are the adapter's subgraphs.
  • The same endpoint and token work for direct chat MCP clients and for trip2g federation proxying. There is no separate "direct mode" vs "federation mode".
  • Hub does not know Telegram credentials and does not store Telegram content.

For compatibility, the adapter should expose the same six MCP tools:

search(query)
similar(note_id|pid)
note_html(note_id|pid)

federated_search(query, kb_id?, kb_ids?)
federated_similar(note_id, kb_id)
federated_note_html(note_id, kb_id, match_id?)

Demo implementation priority:

  1. search — required, carries the real query DSL.
  2. note_html — required enough to open a selected message/thread.
  3. similar — return empty result set for MVP.
  4. federated_* — return leaf/no-federation-configured payload for MVP.

The slightly strange part is intentional: dialog listing and structured message retrieval are encoded inside search(query) as JSON. This keeps the federation hub unchanged while still giving the agent richer Telegram-specific operations.

Data Model

Use one Qdrant collection, for example telegram_knowledge.

Each point represents one retrievable record. MVP record types:

  • message — one Telegram message.
  • dialog — one channel/group/private dialog metadata record.
  • thread_summary — optional later, a summarized window/thread.
  • decision — optional later, extracted decision.
  • task_state — optional later, extracted task/status.
  • profile_fact — optional later, extracted fact about a person or project.

MVP can index only message and dialog; the other record types are reserved so the query DSL does not need a breaking change later.

Payload shape:

{
  "record_type": "message",
  "source": "telegram",
  "scope_id": "tg:chat:-1001234567890",
  "dialog_id": "tg:chat:-1001234567890",
  "dialog_type": "channel",
  "dialog_title": "Project Alpha",
  "message_id": 54321,
  "user_id": "tg:user:123456",
  "username": "alice",
  "display_name": "Alice",
  "created_at": "2026-04-28T12:34:56Z",
  "created_at_ts": 1777379696,
  "text": "message text",
  "source_url": "https://t.me/c/1234567890/54321",
  "indexed_at": "2026-04-28T12:40:00Z",
  "indexed_at_ts": 1777380000
}

Dialog metadata point:

{
  "record_type": "dialog",
  "source": "telegram",
  "scope_id": "tg:chat:-1001234567890",
  "dialog_id": "tg:chat:-1001234567890",
  "dialog_type": "group",
  "dialog_title": "Trip2G Builders",
  "created_at": "2026-04-28T00:00:00Z",
  "created_at_ts": 1777334400,
  "last_message_at": "2026-04-28T12:34:56Z",
  "last_message_at_ts": 1777379696,
  "participants_hint": 12
}

Stable point id:

  • Use deterministic 63-bit integer for MCP note_id/pid.
  • Suggested input: sha256("telegram:" + dialog_id + ":" + message_id) truncated to positive int64.
  • Dialog point uses sha256("telegram-dialog:" + dialog_id).
  • Store the original string ids in payload to avoid relying on the hash for human debugging.

Required Qdrant payload indexes:

{"field_name": "record_type", "field_schema": "keyword"}
{"field_name": "scope_id", "field_schema": "keyword"}
{"field_name": "dialog_id", "field_schema": "keyword"}
{"field_name": "dialog_type", "field_schema": "keyword"}
{"field_name": "user_id", "field_schema": "keyword"}
{"field_name": "username", "field_schema": "keyword"}
{"field_name": "created_at", "field_schema": "datetime"}
{"field_name": "created_at_ts", "field_schema": "integer"}
{"field_name": "last_message_at_ts", "field_schema": "integer"}

created_at_ts is the pragmatic sort key for "latest N messages"; keep ISO created_at for readability.

Token Auth And Dialog Subgraphs

Adapter config maps each token to allowed Telegram dialog scopes.

Terminology:

  • token: opaque capability credential generated by the adapter and shown once to the user.
  • scope_id: one Telegram dialog scope, for example tg:chat:-1001234567890.
  • subgraph: federation-facing name for a scope_id. In this adapter, one selected dialog is one subgraph.

The adapter accepts the token in two transport forms:

  • Preferred: Authorization: Bearer <token>.
  • Compatibility: POST /_system/mcp?token=<token> for clients that cannot set headers easily.

These are not separate auth modes. They are two ways to pass the same token. If both are present, the header wins. Logs must never include token values.

Example config:

public_url = "https://telegram-adapter.example"
qdrant_url = "http://localhost:6333"
collection = "telegram_knowledge"

[[telegram.sources]]
scope_id = "tg:chat:-1001234567890"
kind = "channel"
title = "Project Alpha"

[[telegram.sources]]
scope_id = "tg:chat:-1009876543210"
kind = "group"
title = "Builders Group"

[[tokens]]
name = "trip2g-hub-2026"
token_env = "FEDERATION_TG_TOKEN"
allowed_scopes = [
  "tg:chat:-1001234567890",
  "tg:chat:-1009876543210"
]

Inbound request behavior:

  1. Extract token from Authorization: Bearer <token> or ?token=....
  2. Missing token: anonymous mode; only scopes explicitly marked public in config.
  3. Invalid token: return empty private results without revealing which scopes exist.
  4. Compute effective scopes: requested filter scopes intersect allowed_scopes.
  5. If effective scopes are empty, return empty results rather than leaking existence.

This mirrors trip2g federation behavior: inaccessible targets are indistinguishable from not configured.

User Onboarding Scenario

  1. User opens the adapter setup UI and signs in as the adapter owner.
  2. User connects Telegram by entering Telegram API credentials from my.telegram.org: api_id, api_hash, phone login code, and optional 2FA password.
  3. Adapter stores the Telegram session encrypted at rest. The trip2g hub never receives Telegram credentials.
  4. Adapter lists available channels, groups, and private dialogs.
  5. User selects which dialogs to index. This creates the ingestion source list.
  6. User creates one access token and chooses which indexed dialogs this token exposes.
  7. Adapter shows MCP connection data:
URL: https://telegram-adapter.example/_system/mcp
Header: Authorization: Bearer <token>
Fallback URL: https://telegram-adapter.example/_system/mcp?token=<token>

The selected dialogs are the token's subgraphs. Creating another token with a different dialog subset creates another capability boundary without changing Telegram ingestion.

Search Query DSL

search keeps the MCP schema as {"query": "..."}.

If query is plain text, adapter runs default semantic message search:

{
  "query": "what did Alice decide about Telegram adapter?"
}

If query parses as JSON, adapter treats it as a structured Telegram query.

Envelope:

{
  "op": "search",
  "text": "Telegram adapter decisions",
  "mode": "hybrid",
  "limit": 20,
  "filter": {
    "record_types": ["message"],
    "dialog_ids": ["tg:chat:-1001234567890"],
    "dialog_types": ["channel", "group", "private"],
    "user_ids": ["tg:user:123456"],
    "usernames": ["alice"],
    "after": "2026-04-01T00:00:00Z",
    "before": "2026-04-29T00:00:00Z"
  },
  "order": "relevance",
  "context": {
    "before": 2,
    "after": 2
  }
}

Fields:

  • op: search, latest, dialogs, or dialog_messages.
  • text: semantic query text. Required for search, optional for latest.
  • mode: vector, keyword, hybrid, or payload. MVP can implement vector and payload; hybrid can be vector plus payload filters until keyword search exists.
  • limit: default 20, max 100.
  • filter.record_types: defaults to ["message"].
  • filter.dialog_ids: restrict to dialogs.
  • filter.dialog_types: restrict to channel, group, private.
  • filter.user_ids: restrict to Telegram sender ids.
  • filter.usernames: restrict to usernames.
  • filter.after / filter.before: timestamp bounds.
  • order: relevance, created_at_desc, created_at_asc, last_message_at_desc.
  • context.before / context.after: optional neighboring messages to include in note_html or snippet expansion.

Query:

{
  "op": "search",
  "text": "gold mine ideas for vibe coding army",
  "mode": "vector",
  "limit": 10,
  "filter": {
    "dialog_ids": ["tg:chat:-1001234567890"],
    "record_types": ["message"]
  }
}

Qdrant behavior:

  • Embed text.
  • Use Qdrant vector query over telegram_knowledge.
  • Apply payload filter for record_type, dialog_id, authorized scopes, and optional date/sender constraints.

MCP result item:

{
  "title": "Project Alpha / Alice / 2026-04-28 12:34",
  "note_id": 8248823371200214,
  "note_path": "telegram/tg:chat:-1001234567890/54321",
  "href": "tg://message/tg:chat:-1001234567890/54321",
  "url": "https://t.me/c/1234567890/54321",
  "kind": "telegram_message",
  "score": 0.83,
  "matches": [
    {
      "match_id": "tg:chat:-1001234567890:54321",
      "snippet": "We should test ten ideas in parallel and move resources to the winners...",
      "context_words": 40
    }
  ]
}

Operation: latest messages

Query:

{
  "op": "latest",
  "limit": 100,
  "filter": {
    "user_ids": ["tg:user:123456"],
    "dialog_ids": ["tg:chat:-1001234567890"]
  },
  "order": "created_at_desc"
}

Qdrant behavior:

  • Use points/scroll.
  • Filter by record_type = message, authorized scopes, user/dialog filters.
  • order_by.created_at_ts desc.
  • Requires created_at_ts payload index.

This answers the use case: "give me the last 100 messages by user X".

Operation: list dialogs

Query:

{
  "op": "dialogs",
  "limit": 100,
  "filter": {
    "dialog_types": ["channel", "group", "private"]
  },
  "order": "last_message_at_desc"
}

Qdrant behavior:

  • Use points/scroll.
  • Filter by record_type = dialog and authorized scopes.
  • Sort by last_message_at_ts desc.

MCP result kind is telegram_dialog. This lets the agent discover available channels/groups/private dialogs through the same federation interface, even though it is not a pure RAG search.

Operation: dialog messages

Query:

{
  "op": "dialog_messages",
  "limit": 50,
  "filter": {
    "dialog_ids": ["tg:chat:-1001234567890"],
    "after": "2026-04-28T00:00:00Z"
  },
  "order": "created_at_asc"
}

Qdrant behavior:

  • Use points/scroll.
  • Filter by one dialog and date range.
  • Sort by created_at_ts.

Useful for reconstructing a conversation window after a semantic hit.

MCP Method Behavior

tools/list

Return all six tools. Descriptions should explicitly tell the agent that Telegram-specific structured operations are encoded as JSON inside search.query.

initialize

Return instructions:

This is a Telegram federation adapter. Use search(query) for all retrieval.
Plain text query runs semantic search over authorized Telegram messages.
JSON query enables structured operations:
- {"op":"dialogs"} lists authorized dialogs.
- {"op":"latest","filter":{"user_ids":["tg:user:..."]}} returns recent messages.
- {"op":"dialog_messages","filter":{"dialog_ids":["tg:chat:..."]}} returns a timeline.
After search, use note_html(pid=...) to open a message with context.

Required. Implements both plain text and JSON DSL.

Response shape must match trip2g SearchResultPayload:

{
  "query": "...",
  "results": []
}

Allowed result kind values:

  • telegram_message
  • telegram_dialog
  • telegram_thread_summary
  • telegram_decision
  • telegram_task_state
  • telegram_profile_fact

note_html

MVP should implement this even if search is the main method. Federation UX expects a second call after selecting a result.

Input:

{"pid": 8248823371200214}

Output:

<article data-source="telegram" data-dialog-id="tg:chat:-1001234567890" data-message-id="54321">
  <h1>Project Alpha / Alice / 2026-04-28 12:34</h1>
  <p>Message text...</p>
  <p><a href="https://t.me/c/1234567890/54321">Open in Telegram</a></p>
</article>

If context.before/after was encoded in match_id later, return neighboring messages around the selected message. MVP can return only the selected message.

similar

Return empty result set for MVP:

{
  "source": {},
  "results": []
}

federated_*

Return structured leaf response:

{
  "federation": {
    "configured": false,
    "reason": "telegram adapter is a leaf MCP base"
  }
}

Indexing Pipeline

Runtime components:

  1. Telegram client session.
  2. Source registry from config.
  3. Backfill job.
  4. Incremental cron job.
  5. Message normalizer.
  6. Embedding worker.
  7. Qdrant upsert.
  8. MCP HTTP server.

Backfill:

  • For each configured source, fetch message history newest-to-oldest.
  • Normalize messages into message points.
  • Upsert one dialog point per source.
  • Batch embeddings and Qdrant upserts.
  • Store source cursor: last indexed message id/date.

Incremental cron:

  • Every N minutes, fetch messages newer than cursor.
  • Upsert new messages.
  • Update dialog metadata point.
  • Re-embed edited messages if Telegram edit timestamp changes.

Deletions:

  • MVP can ignore Telegram deletions.
  • Stage 2: tombstone points with deleted_at_ts rather than hard-delete.

Implementation Steps

Step 1 — Adapter skeleton

  • Create a separate package/binary, for example cmd/federation-telegram-adapter.
  • Expose POST /_system/mcp.
  • Implement JSON-RPC initialize, tools/list, and tools/call.
  • Return all six tool definitions.

Acceptance:

  • initialize returns Telegram adapter instructions.
  • tools/list returns six tools.
  • Unknown tool returns JSON-RPC method/tool error.

Step 2 — Config and auth

  • Load adapter config from TOML/env.
  • Extract token from Authorization: Bearer <token> or ?token=....
  • Verify token by constant-time comparison against configured/generated token hashes.
  • Map token to allowed_scopes.
  • Provide anonymous scopes only if explicitly configured.

Acceptance:

  • Valid token can query allowed scopes.
  • Missing/invalid token cannot infer private dialogs.
  • Requested scopes are intersected with allowed scopes.

Step 3 — Qdrant schema

  • Create collection if missing.
  • Ensure payload indexes listed above.
  • Store vector size/model in config.

Acceptance:

  • Fresh Qdrant instance can be initialized by adapter.
  • created_at_ts sorting works through scroll.
  • record_type, dialog_id, user_id, and scope_id filters work.

Step 4 — Telegram ingestion

  • Connect to Telegram session.
  • Backfill configured channels/groups/private dialogs.
  • Upsert dialog and message points.
  • Track cursors per source.

Acceptance:

  • One configured channel indexes messages into Qdrant.
  • Re-running indexing is idempotent.
  • Dialog metadata point updates last_message_at_ts.

Step 5 — search JSON DSL

  • Parse plain text as default semantic search.
  • Parse JSON query envelope for search, latest, dialogs, dialog_messages.
  • Map filters to Qdrant filters.
  • Map Qdrant points to trip2g SearchResultPayload.

Acceptance:

  • Plain text query returns vector-ranked telegram_message results.
  • {"op":"dialogs"} returns telegram_dialog results.
  • {"op":"latest","filter":{"user_ids":["tg:user:..."]}} returns newest messages sorted desc.
  • Unauthorized dialogs never appear.

Step 6 — note_html

  • Fetch Qdrant point by pid.
  • Render message/dialog as simple HTML/text.
  • Include source URL if available.

Acceptance:

  • note_html(pid) opens a message returned by search.
  • Missing or unauthorized pid returns "not found" without leaking whether the point exists.

Step 7 — Tests and smoke scenario

Use fake Qdrant or a test Qdrant container only for integration; helper tests should not require Qdrant.

Required tests:

  • JSON DSL parsing.
  • Scope intersection.
  • Qdrant filter builder.
  • Result mapping.
  • Token extraction and verification.
  • Latest-by-user query.
  • Dialog listing query.
  • note_html by pid.

Manual smoke:

  1. Configure one private Telegram channel.
  2. Run backfill.
  3. Add KB-note in trip2g hub:
---
mcp_federation_kb_url: https://telegram-adapter.local/_system/mcp
mcp_federation_kb_id: telegram
---
Use when: private Telegram channels, team chats, dialog history, operational memory.

Store the adapter token in the hub's federation secret store and send it as Authorization: Bearer <token>. For clients that cannot set headers, use https://telegram-adapter.local/_system/mcp?token=<token> as the MCP URL instead.

  1. Ask hub:
{"query":"{\"op\":\"dialogs\",\"limit\":20}"}
  1. Ask hub:
{"query":"{\"op\":\"latest\",\"filter\":{\"user_ids\":[\"tg:user:123456\"]},\"limit\":100}"}
  1. Ask hub:
{"query":"what did the team decide about federation telegram adapter?"}

Expected: all three work through federated_search(query=..., kb_id="telegram").

Deferred

  • Automatic extraction into decision, task_state, and profile_fact records.
  • Thread/window summaries.
  • Deletion sync.
  • Media transcription and OCR.
  • Keyword/BM25 search in addition to vector search.
  • ActivityPub/Mastodon adapter.
  • Federation Agreement Metadata.
  • Billing/revenue logic.

Open Design Notes

  • Qdrant is acceptable here because adapter data is external, high-volume, and filter-heavy. This does not contradict docs/dev/why_not_qdrant.md, which argues against moving trip2g's native note search to Qdrant at current scale.
  • The JSON-in-query DSL is intentionally ugly but useful. It avoids changing the federation hub and lets agents discover dialogs through a single leaf MCP adapter.
  • If this becomes user-facing beyond demos, consider adding adapter-specific tools outside federation. For now, federation compatibility is more important than aesthetic purity.