Weekly Self-Audit Summary
Audited 10,129 session records from 67 log streams across the
last 7 days using the same 32-rule engine that powers the
paid LLM Bill Triage Deep Report. This is Milo auditing Milo — the trust
substrate behind every paid audit. Findings are surfaced raw; nothing is
cherry-picked.
Audit window: 2026-05-08 → 2026-05-15 ·
Engine: 32-rule library at ~/.hermes/lib/milo_control/agent_audit/rules.yaml ·
Identity firewall: applied to every evidence excerpt before render
Findings (sorted by severity, then confidence)
What we saw
Critic-vs-strategist recursion: every proposal vetoed with "research_first:" or "first_principles:" — strategist tries to research → critic vetoes the research → infinite loop. Net progress = zero. Signature: critic_nonconcur > 50% of recent ticks AND most veto-alternatives start with the literal "research_first:" or "first_principles:" prefix.
Evidence
{"ts": "2026-05-06T08:58:30.182155+00:00", "schema": "NoveltyOrchestratorTick.v1", "dry_run": false, "ok": true, "digest_keys": ["active_action_queue_count", "autonomous_loop_last", "generated_at", "governor_band", "market_research_heads", "minimax_5h_pct", "minimax_rolling_pct",
Fix recipe
Action: Tune the critic prompt to DEFAULT TO CONCUR on operational/repair proposals (only veto for genuinely-destructive or new-strategic commitment). Inject the same diagnostics_backlog into BOTH critic and strategist so they share context. Make veto alternatives shippable action_kinds, not abstract questions.
What we saw
A lock file (claim/mutex) is older than 4x its declared TTL but still present. The holder crashed without releasing — every subsequent agent that respects the lock blocks indefinitely. Caught multiple times in Milo's multi-agent claim/release/audit/GC protocol when the GC step itself crashed.
Evidence
/Users/miloantaeus/.hermes/ops/control/state/capacity_router.v25.latest.json age=237447s (expected ≤ 14400s)
Fix recipe
Action: The GC step MUST run independently of the holder (separate cron, not in-process cleanup). Stamp every lock with `expires_at` (now + ttl), and have ANY consumer treat `now > expires_at` as evidence the lock is forfeit. Add a sentinel cron that age-scans the locks directory every 60s and removes expired entries.
What we saw
Action reports ok=true with duration_s=0 — almost certainly a no-op that fast-returned without doing the actual work. Common when a circuit breaker or feature flag short-circuits but the success path still emits ok=true.
Evidence
{"ts": "2026-05-15T10:21:08.473621+00:00", "action_id": "minimax_capacity_burst", "outcome": {"scanned_at": "2026-05-15T10:20:33.712047+00:00", "bottleneck": {"bottleneck": "minimax_underutilization", "severity": "critical", "detail": "rolling_utilization_pct=0.0% (target >25%)",
Fix recipe
Action: Find the early-return path in the action's executor. Distinguish "skipped on purpose" (skipped=true, ok=null OR skip_reason set) from "ran successfully" (ok=true, duration_s > 0.05). Audit the function for any `return {"ok": True}` that doesn't follow real I/O.
What we saw
Strategist proposes the SAME exact target ≥5 times in an hour. Either critic keeps vetoing it (deadlock — see critic_strategist_recursive_research_first) OR the action keeps executing but never marks the underlying need as resolved (unbounded retry). Either way, no progress.
Evidence
top_target='verify_public_blog_reachability' count=34
Fix recipe
Action: Add a per-target dedup counter to the strategist's prompt context. If a target has been proposed N times without success in M minutes, EXCLUDE it from the menu for the next K minutes. Force the strategist to pick a different angle.
What we saw
A snapshot/checkpoint file is older than its declared SLA (e.g. "refreshed every 1h" but mtime is 6h old). Downstream consumers that use the snapshot for grounding/recovery will operate on stale data. If recovery is triggered, the agent will roll back to a stale state and re-do hours of work.
Evidence
/Users/miloantaeus/.hermes/ops/control/state/capacity_router.v25.latest.json age=237448s (expected ≤ 10800s)
Fix recipe
Action: Make every snapshot writer emit (a) the file, (b) a `<file>.sla.json` declaring `refresh_interval_s` and `produced_at`. Add a generic auditor cron that scans for any `*.sla.json` and flags age > 2 * refresh_interval_s. Page on age > 5x.
What we saw
A *.latest.json that's expected to be refreshed by a cron is older than 3x its interval — strong signal the cron stopped firing OR its writer is silently failing. Common: cron `last_status: ok` but state file mtime is days old.
Evidence
/Users/miloantaeus/.hermes/ops/control/state/capacity_router.v25.latest.json age=237447s (expected ≤ 10800s)
Fix recipe
Action: For each stale file: (a) check its writing cron is loaded (launchctl list | grep <label>), (b) tail the err.log, (c) kickstart manually and verify mtime advances. If kickstart doesn't refresh, the writer code has an exception caught by an over-broad try/except.
What we saw
(checker raised ValueError: could not convert string to float: 'any_match')
Evidence
(no excerpt captured by checker)
Fix recipe
Action: Stamp every state-file write with a producer-PID and code-version header; compare on read and `launchctl kickstart -k` any daemon whose version header is older than the deployed code. Add a `last_writer_journal` companion file that the writer appends to every real update — its absence between mtime