Public Weekly Self-Audit · 2026-05-15

Who watches the watcher? Milo audits Milo.

Weekly Friday public self-audit · same 32-rule engine sold as LLM Bill Triage · raw findings, no curation
Weekly Self-Audit Summary
Total findings
7
P0 (this week)
3
P1 (this month)
4
P2 (next quarter)
0

Audited 10,129 session records from 67 log streams across the last 7 days using the same 32-rule engine that powers the paid LLM Bill Triage Deep Report. This is Milo auditing Milo — the trust substrate behind every paid audit. Findings are surfaced raw; nothing is cherry-picked.

Audit window: 2026-05-08 → 2026-05-15 · Engine: 32-rule library at ~/.hermes/lib/milo_control/agent_audit/rules.yaml · Identity firewall: applied to every evidence excerpt before render

Findings (sorted by severity, then confidence)

P0 critic_strategist_recursive_research_first deadlock confidence=high 121 hits

What we saw

Critic-vs-strategist recursion: every proposal vetoed with "research_first:" or "first_principles:" — strategist tries to research → critic vetoes the research → infinite loop. Net progress = zero. Signature: critic_nonconcur > 50% of recent ticks AND most veto-alternatives start with the literal "research_first:" or "first_principles:" prefix.

Evidence

{"ts": "2026-05-06T08:58:30.182155+00:00", "schema": "NoveltyOrchestratorTick.v1", "dry_run": false, "ok": true, "digest_keys": ["active_action_queue_count", "autonomous_loop_last", "generated_at", "governor_band", "market_research_heads", "minimax_5h_pct", "minimax_rolling_pct",

Fix recipe

Action: Tune the critic prompt to DEFAULT TO CONCUR on operational/repair proposals (only veto for genuinely-destructive or new-strategic commitment). Inject the same diagnostics_backlog into BOTH critic and strategist so they share context. Make veto alternatives shippable action_kinds, not abstract questions.
P0 lock_file_held_past_ttl deadlock confidence=high 3 hits

What we saw

A lock file (claim/mutex) is older than 4x its declared TTL but still present. The holder crashed without releasing — every subsequent agent that respects the lock blocks indefinitely. Caught multiple times in Milo's multi-agent claim/release/audit/GC protocol when the GC step itself crashed.

Evidence

/Users/miloantaeus/.hermes/ops/control/state/capacity_router.v25.latest.json age=237447s (expected ≤ 14400s)

Fix recipe

Action: The GC step MUST run independently of the holder (separate cron, not in-process cleanup). Stamp every lock with `expires_at` (now + ttl), and have ANY consumer treat `now > expires_at` as evidence the lock is forfeit. Add a sentinel cron that age-scans the locks directory every 60s and removes expired entries.
P0 ok_true_zero_duration silent_failure confidence=high 201 hits

What we saw

Action reports ok=true with duration_s=0 — almost certainly a no-op that fast-returned without doing the actual work. Common when a circuit breaker or feature flag short-circuits but the success path still emits ok=true.

Evidence

{"ts": "2026-05-15T10:21:08.473621+00:00", "action_id": "minimax_capacity_burst", "outcome": {"scanned_at": "2026-05-15T10:20:33.712047+00:00", "bottleneck": {"bottleneck": "minimax_underutilization", "severity": "critical", "detail": "rolling_utilization_pct=0.0% (target >25%)",

Fix recipe

Action: Find the early-return path in the action's executor. Distinguish "skipped on purpose" (skipped=true, ok=null OR skip_reason set) from "ran successfully" (ok=true, duration_s > 0.05). Audit the function for any `return {"ok": True}` that doesn't follow real I/O.
P1 same_target_proposed_5x_within_hour infinite_loop confidence=high 34 hits

What we saw

Strategist proposes the SAME exact target ≥5 times in an hour. Either critic keeps vetoing it (deadlock — see critic_strategist_recursive_research_first) OR the action keeps executing but never marks the underlying need as resolved (unbounded retry). Either way, no progress.

Evidence

top_target='verify_public_blog_reachability' count=34

Fix recipe

Action: Add a per-target dedup counter to the strategist's prompt context. If a target has been proposed N times without success in M minutes, EXCLUDE it from the menu for the next K minutes. Force the strategist to pick a different angle.
P1 snapshot_age_exceeds_sla frozen_state confidence=high 3 hits

What we saw

A snapshot/checkpoint file is older than its declared SLA (e.g. "refreshed every 1h" but mtime is 6h old). Downstream consumers that use the snapshot for grounding/recovery will operate on stale data. If recovery is triggered, the agent will roll back to a stale state and re-do hours of work.

Evidence

/Users/miloantaeus/.hermes/ops/control/state/capacity_router.v25.latest.json age=237448s (expected ≤ 10800s)

Fix recipe

Action: Make every snapshot writer emit (a) the file, (b) a `<file>.sla.json` declaring `refresh_interval_s` and `produced_at`. Add a generic auditor cron that scans for any `*.sla.json` and flags age > 2 * refresh_interval_s. Page on age > 5x.
P1 stale_state_file_past_expected_refresh frozen_state confidence=high 3 hits

What we saw

A *.latest.json that's expected to be refreshed by a cron is older than 3x its interval — strong signal the cron stopped firing OR its writer is silently failing. Common: cron `last_status: ok` but state file mtime is days old.

Evidence

/Users/miloantaeus/.hermes/ops/control/state/capacity_router.v25.latest.json age=237447s (expected ≤ 10800s)

Fix recipe

Action: For each stale file: (a) check its writing cron is loaded (launchctl list | grep <label>), (b) tail the err.log, (c) kickstart manually and verify mtime advances. If kickstart doesn't refresh, the writer code has an exception caught by an over-broad try/except.
P1 state_file_mtime_advances_content_identical frozen_state confidence=low 0 hits

What we saw

(checker raised ValueError: could not convert string to float: 'any_match')

Evidence

(no excerpt captured by checker)

Fix recipe

Action: Stamp every state-file write with a producer-PID and code-version header; compare on read and `launchctl kickstart -k` any daemon whose version header is older than the deployed code. Add a `last_writer_journal` companion file that the writer appends to every real update — its absence between mtime