Agent-Runtime: 3-Way Implementation Decision (ours / Codex / Opus)
1. TL;DR Verdict
Yes — ours (feat/agent-runtime) is the right base to ship. Across all six seams ours is either the outright winner or the most-hardened variant, and it is the only implementation with tests (loop, jsonnet, janitor, scope, fleet). Ours wins the two security-critical seams outright (concurrency race-safety, jsonnet redaction) and ties on the loop's security axis. The competitors are unhardened single-pass implementations — but each made 2-3 genuinely better local design choices that ours should graft in.
Do not rewrite. Cherry-pick. Ours' weaknesses are localized correctness bugs and DRY gaps, not architectural mistakes. The highest-value work is fixing ours' own bugs (some of which we'd fix even without borrowing anyone's code), then grafting Opus's cleaner abstractions and Codex's reusable helpers.
Ranked adopt list (value / effort):
| Rank | Adopt | From | Effort | Why it matters |
|---|---|---|---|---|
| 1 | Cron attach_notes materialization (shared MaterializeAttachedNotes) |
Codex | high | Fixes a real functional gap: cron roles get zero context notes today |
| 2 | RemoteKB overlay sync on Write/Patch | Opus | low | Fixes read-after-write staleness in a single fleet run |
| 3 | Reject ambiguous (non-unique) find in patch |
Opus | low | Fixes silent wrong-card patch in FileKB |
| 4 | Move read-scope check into canreadnote chokepoint |
Codex | medium | Closes the similarNotes scope leak for free |
| 5 | Explicit WebhookScoped flag for read+write enforcement |
Opus | medium | Fixes read fail-open on empty read_patterns |
| 6 | Centralize concurrency_mode enum in webhookutil |
Codex | low | Kills 5x enum duplication |
| 7 | Named Kind constants + IsPatch() + explicit write Kind |
Opus | low | Removes magic strings, symmetric write/patch |
| 8 | Patcher optional interface (split out of core KB) | Opus | medium | Correct ISP, matches the named seam, frees test KB |
| 9 | Config gaps: ceilings>0, offered-tools non-empty, healthz, api_token 400 | Opus | low/med | Daemon hardening ours lacks |
| 10 | Canonical/compact EvalJSON output | Opus | low | Cleaner signed/sent/logged bytes |
2. Seam Winner Table
| Seam | Winner | What to adopt (from whom) | Effort |
|---|---|---|---|
| core-loop | mixed (ours' enforcement + Opus's KB design) | Patcher interface, uniqueness guard, Kind constants (Opus); defensive Kind validation (Codex) | low-med |
| jsonnet-transform | ours | Canonical compact output (Opus); empty-src fast path (Codex) | low |
| concurrency | ours | webhookutil concurrency centralization (Codex) | low |
| scope-enforcement | mixed (ours' coverage + Codex chokepoint + Opus flag) | canreadnote chokepoint (Codex); WebhookScoped flag + resolved-path match (Opus) |
medium |
| fleet | ours | overlay sync, Config.Validate, healthz, transform passthrough (Opus) | low-high |
| attach-attrib-migrations | mixed (ours' concurrency + Codex attach helpers) | shared MaterializeAttachedNotes for both kinds (Codex); richer Meta allowlist (Opus); coalesce spend (Codex) | low-high |
3. Prioritized Adopt List
Clear wins (pull into feat/agent-runtime)
-
Cron attach_notes materialization — from Codex. Extract
internal/webhookutil/attached_notes.gowith a singleMaterializeAttachedNotes(patterns, nvs)+AttachGateSatisfied, and call the same function from bothdeliverchangewebhookanddelivercronwebhook. Beats ours because ours materializes for change webhooks only — cron roles withattach_notesconfigured receive nothing despite the column/gate existing. Also de-duplicates ours' inlined materialize+apply logic. Effort: high. -
RemoteKB overlay sync on Write/Patch — from Opus (
remotekb.go:107,120; also Codexremotekb.go:60,74). Setoverlay[path]=contenton write; apply the single find→replace tooverlay[path]on patch. Beats ours because ours'RemoteKB.Write/Patchnever update the overlay, so a read after a same-run edit returns stale attached content. Effort: low. -
Reject ambiguous
findin the patch path — from Opus (scope.go:103-104; Codexfilekb.go:115-117). Error whenfindoccurs zero or >1 times; instruct the model in the system prompt to add surrounding context for uniqueness (Opusruntime.go:82). Beats ours becauseFileKB.Patchusesstrings.Replace(..., 1)and silently patches the first of N matches — a wrong-card hazard for surgical kanban edits. Effort: low. -
canreadnote read-scope chokepoint — from Codex (
case/canreadnote/resolve.go:24). Inject theread_patternscheck at the top ofcanreadnote.Resolve. Beats ours because ours duplicates the read check in 3 places (note resolver, notePaths filter, sitesearch) and missedsimilarNotes, letting a scoped agent read foreign note content. Keep ours' notePaths filter (it bypasses canreadnote) and ours' globalAroundOperationsstamp. Effort: medium. -
Explicit
WebhookScopedflag — from Opus (appreq/request.go:355). Addreq.WebhookScoped bool+Scoped()/ReadPatterns()/WritePatterns()helpers set in every shortapitoken parse path; drive both read and write enforcement offScoped(). Beats ours because ours keys reads onlen(rp)>0(fail-open on empty patterns) and writes onDeliveryKind!=""— two inconsistent signals; the empty-read_patternscase currently reads everything. Effort: medium. -
Centralize concurrency_mode enum — from Codex (
webhookutil/concurrency.go). Constants +NormalizeConcurrencyMode+ValidateConcurrencyMode, used in the dispatch switch and all four CRUD resolvers. Beats ours' 5 independent copies of{allow_overlap,skip,queue_one}(drift risk). No behavior change to ours' superior atomic SQL guard. Effort: low. -
Named Kind constants + explicit write Kind — from Opus (
agentresponse.go:13-14,40-42). Replace bare"patch"magic strings withAgentChangeKindWrite/Patch+IsPatch(), and stamp an explicit Kind onwrite_note. Beats ours' empty-Kind-on-write asymmetry that leans onempty==writebackward-compat. Effort: low. -
Coalesce delivery spend — from Codex.
UpdateWebhookDeliveryResultshould settokens_used/stepsviacoalesce(sqlc.narg(...), tokens_used)instead of plain assignment. Beats ours' NULL-clobber risk (safe only because spend is single-write today). Effort: low.
Optional (quality, no behavior break)
-
Patcher optional interface — from Opus (
kb.go:36-38+scope.go:88-108). SplitPatchout of the core KB interface;ScopedKB.Patchdelegates toPatcherwhen implemented, else a single-unique-occurrence RMW fallback. Correct ISP, literal match for the seam's named "KB/Patcher interface", and lets the test memKB drop its Patch boilerplate. Pairs naturally with adopt #3. Opus'sremotekb.goalready proves the payoff with a compile-time_ agentruntime.Patcherassertion. Effort: medium. -
Config.Validate() method — from Opus (
config.go:40-77). Move validation+normalization ontofleet.Config, addingTokenCeiling/StepCeiling>0and non-emptyOfferedToolsguards (ours'main.govalidateConfig omits these —-token-ceiling 0yieldsMaxTokens 0). Effort: medium. -
healthz + api_token 400 guard — from Opus (
fleet.go:83,119; Codexhandler.go:68). Standard daemon ergonomics; fail fast instead of an eventual 502. Effort: low. -
TransformJsonnet on Role — from Opus. Add the field and thread it into
reconcile.createinstead of hardcoding"". Closes a real feature gap (roles can't configure an outbound transform). Effort: medium. -
Richer Meta allowlist — from Opus (
AllowlistMeta+MetaTags). Expose status/route/routes/lang (not just tags+layout), asmap[string]anyso non-string frontmatter survives. Effort: medium. -
Canonical compact EvalJSON output — from Opus (
jsonneteval.go:45-56). Round-trip throughjson.Unmarshal→json.Marshal. Beats ours' raw go-jsonnet output (indented + trailing newline) for the triple-duty signed/sent/logged body. Effort: low. -
Validate empty-src fast path — from Codex (
jsonneteval.go:42-45). Makejsonneteval.Validatefoolproof regardless of caller guards. Effort: low. -
Resolved-path Note match — from Opus (
schema.resolvers.go:3169). Match the Note read check onresponse.Note.Pathrather than ours' permalink→fsPath indirection that can return""and fall back to a URL-namespace path. Effort: low. -
Defensive Kind validation — from Codex (
agentresponse.go:44).ozzo.In("write","patch")inAgentChange.Validateto fail fast on garbage Kind. Effort: low.
Skip
- Opus's in-process
sync.Mutexconcurrency guard — strictly worse than ours' atomic SQL conditional insert; breaks under multi-instance (LiteFS leader+replica), which the repo's read-replica branch makes a real concern. - Codex's constant marker version (
#v1) — security regression: the per-role HMAC secret never rotates on spec change. Ours' specVer-in-marker rotation is correct. - Codex's flat
-1 hourstale window — correctness regression vs ours' per-webhooktimeout_seconds+30s: reaps long agent runs and blocks crashed fast webhooks for an hour. - Opus's denylist ExtVar redaction — fail-open; keep ours' fail-closed allowlist. Same for Codex's denylist + hardcoded
meta:"{}". - Opus's whole-body
payloadExtVar /transformExtVarsdead error branch — keep ours' scoped change/attached_notes/meta vars.
4. Bugs / Weaknesses in OURS (fix regardless of adoption)
Highest priority — these are correctness/security defects the comparison surfaced. Several should be fixed even if we borrow none of their code.
-
[functional] Cron attach_notes silently dropped.
cronWebhookPayload(delivercronwebhook/resolve.go:57-63) has noAttachedNotesfield and the cron deliver job never materializes — a cron role withattach_notesconfigured receives zero context notes though column/gate/CRUD all exist. (Adopt #1) -
[security] similarNotes is unscoped.
case/similarnotes/resolve.go:137callsenv.CanReadNote, but ours never added scope tocanreadnoteand the read check lives only in the note resolver + sitesearch — a scoped agent can read content/similarity of notes outside itsread_patterns. (Adopt #4) -
[security] Read enforcement fails OPEN on empty scope.
schema.resolvers.go:3154/helpers.go:69gate onlen(rp)>0, so a scoped token with emptyread_patternsreads everything — inconsistent with the fail-closed write path. (Adopt #5) -
[correctness] FileKB.Patch first-match replacement.
filekb.go:111usesstrings.Replace(content, find, replace, 1)with no uniqueness guard — an ambiguousfindpatches the wrong (first) location with no signal to the model. (Adopt #3) -
[correctness] RemoteKB read-after-write staleness.
Write/Patch(remotekb.go:105-115) never update theattach_notesoverlay; a read after a same-run edit returns stale content. (Adopt #2) -
[config] Missing ceiling/offered-tools guards.
validateConfigomitsTokenCeiling/StepCeiling>0(-token-ceiling 0→MaxTokens 0) and non-emptyOfferedTools(empty set silently disables all role tools). (Adopt #10) -
[robustness] Spend NULL-clobber.
UpdateWebhookDeliveryResultuses plaintokens_used = ?; a later status-only update would wipe recorded spend. Safe only because spend is single-write today. (Adopt #8) -
[leak, shared] resolveWikilinks unscoped. Returns
target.Path+URL with noread_patternscheck — leaks existence/paths of out-of-scope notes. All three share this; ours should still fix it. -
[DRY] concurrency_mode enum in 5 places (4 CRUD validators + inline dispatch literals). (Adopt #6)
-
[design smell] Patch on the core KB interface forces every backend (incl. trivial test memKB) into RMW boilerplate; spec names a "KB/Patcher interface". (Adopt #9)
-
[design smell] Asymmetric / magic Kind.
write_noteemits empty Kind while patch sets"patch"; bare magic string, no constant. (Adopt #7) -
[non-canonical] EvalJSON ships indented + trailing-newline JSON as the signed/sent/logged body. (Adopt #14)
-
[cosmetic] Dead defensive guards.
where sqlc.arg(stale_window) is not null(always true);heartbeat_atin the skip-guard coalesce but never written byMarkWebhookDeliveryRunning;ExpireStaleWebhookDeliveriesvsExpireStaleCronWebhookDeliveriesname asymmetry. Either wire a real heartbeat or drop the column from the coalesce.
5. Honest Closing
Ship ours as the base — it is already the strongest and the only tested/hardened tree. Ours wins the two seams where a mistake is expensive and hard to reverse: concurrency (atomic SQL conditional insert is race-safe across goroutines and processes; Codex's flat stale window and Opus's process-local mutex both regress) and jsonnet redaction (fail-closed allowlist + F9 transform/pass_api_key/https guards the others entirely lack). On the core loop it ties Opus on security (execution-time allowlist) while keeping richer Denials observability. That foundation is not on offer from either competitor.
How much better can cherry-picking make it? Materially, but bounded — the gains are localized, not architectural:
- Two fixes are non-negotiable before shipping: cron attach_notes materialization (#1, a silent feature gap) and the similarNotes scope leak (#2/#3, a real read-authorization hole). Both are ours' own bugs, both have clean reference implementations to copy.
- Three more are cheap, high-value correctness fixes: overlay sync (#2-list), patch uniqueness guard (#3-list), spend coalesce (#8-list).
- The rest (Patcher interface, Kind constants, enum centralization, Config.Validate, healthz, canonical output) are quality/ergonomics that raise the floor and reduce drift, but none block shipping.
Net: cherry-picking the "clear wins" closes every correctness and security gap the comparison revealed and absorbs the competitors' best abstractions — while ours keeps the tests, multi-instance race-safety, fail-closed redaction, and observability the others never built. The right move is ours as base + ~6 grafts (Opus's KB/overlay/config polish, Codex's attach+concurrency helpers), not a rebuild. Sequence the two security/functional fixes first; treat the abstraction cleanups as fast-follows.