What is the Agent Health Audit Deep Report?

A comprehensive diagnostic report for AI coding agents. Buyers submit their session log and receive a structured PDF covering all failure categories with fix recipes, evidence traces, and severity ratings.

How accurate are the findings?

Powered by a 32-rule detection library that has been validated against real production session logs. Every finding includes the raw evidence trace so you can audit the logic yourself.

What happens if no critical issues are found?

Refund issued automatically if zero P0 or P1 findings surface. You only pay for actionable results.

How long does delivery take?

Deep report delivered within 24 hours of submitting your session log. Format: structured PDF with findings, evidence, and fix recipes.

Sample: Agent Health Audit Deep Report — Milo (live self-audit)

Audit Summary

Total findings

P0 (today)

P1 (this week)

P2 (next sprint)

Top concerns: critic-strategist deadlock causing 94 wasted ticks in 24h, 80 silent-success records where actions short-circuited but reported success, plus a stale lock file that's been held past TTL — three independent failure modes converging on the same compute waste pattern.

Findings (sorted by severity, then confidence)

P0 critic_strategist_recursive_research_first deadlock confidence=high 94 hits

What we saw

Critic-vs-strategist recursion: every proposal vetoed with "research_first:" or "first_principles:" — strategist tries to research → critic vetoes the research → infinite loop. Net progress = zero. Signature: critic_nonconcur > 50% of recent ticks AND most veto-alternatives start with the literal "research_first:" or "first_principles:" prefix.

Evidence

{"ts": "2026-05-08T16:33:27.467499+00:00", "schema": "AutonomousLoopTick.v1", "dry_run": false, "ok": false, "decision": {"action_kind": "research_first:investigate_root_cause"}}

Fix recipe

Action: Tune the critic prompt to DEFAULT TO CONCUR on operational/repair proposals (only veto for genuinely-destructive or new-strategic commitment). Inject the same diagnostics_backlog into BOTH critic and strategist so they share context. Make veto alternatives shippable action_kinds, not abstract questions.

P0 lock_file_held_past_ttl deadlock confidence=high new in v32 1 hit

What we saw

A lock file (claim/mutex) is older than 4× its declared TTL but still present. The holder crashed without releasing — every subsequent agent that respects the lock blocks indefinitely. Caught multiple times in Milo's multi-agent claim/release/audit/GC protocol when the GC step itself crashed.

Evidence

/Users/miloantaeus/.hermes/ops/control/state/silent_failure_detector.latest.json age=264884s (expected ≤ 14400s) → 18× past TTL

Fix recipe

Action: The GC step MUST run independently of the holder (separate cron, not in-process cleanup). Stamp every lock with expires_at (now + ttl), and have ANY consumer treat now > expires_at as evidence the lock is forfeit. Add a sentinel cron that age-scans the locks directory every 60s and removes expired entries.

P0 ok_true_zero_duration silent_failure confidence=high 80 hits

What we saw

Action reports ok=true with duration_s=0 — almost certainly a no-op that fast-returned without doing the actual work. Common when a circuit breaker or feature flag short-circuits but the success path still emits ok=true.

Evidence

{"ts": "2026-05-11T19:59:08Z", "action_id": "minimax_capacity_burst", "outcome": {"bottleneck": {"severity": "critical", "detail": "rolling_utilization_pct=0.8%"}}, "ok": true, "duration_s": 0}

Fix recipe

Action: Find the early-return path in the action's executor. Distinguish "skipped on purpose" (skipped=true, ok=null OR skip_reason set) from "ran successfully" (ok=true, duration_s > 0.05). Audit the function for any return {"ok": True} that doesn't follow real I/O.

P1 snapshot_age_exceeds_sla frozen_state confidence=high new in v32 1 hit

What we saw

A snapshot/checkpoint file is older than its declared SLA (e.g. "refreshed every 1h" but mtime is 6h old). Downstream consumers that use the snapshot for grounding/recovery will operate on stale data. If recovery is triggered, the agent will roll back to a stale state and re-do hours of work.

Evidence

silent_failure_detector.latest.json age=264884s (expected ≤ 10800s) → 24× past SLA

Fix recipe

Action: Make every snapshot writer emit (a) the file, (b) a <file>.sla.json declaring refresh_interval_s and produced_at. Add a generic auditor cron that scans for any *.sla.json and flags age > 2 × refresh_interval_s. Page on age > 5×.

P1 stale_state_file_past_expected_refresh frozen_state confidence=high 1 hit

What we saw

A *.latest.json that's expected to be refreshed by a cron is older than 3× its interval — strong signal the cron stopped firing OR its writer is silently failing. Common: cron last_status: ok but state file mtime is days old.

Evidence

silent_failure_detector.latest.json age=264884s (expected ≤ 10800s) → 24× past expected refresh

Fix recipe

Action: For each stale file: (a) check its writing cron is loaded (launchctl list | grep <label>), (b) tail the err.log, (c) kickstart manually and verify mtime advances. If kickstart doesn't refresh, the writer code has an exception caught by an over-broad try/except.

Live self-audit: Milo Antaeus session logs

Findings (sorted by severity, then confidence)

What we saw

Evidence

Fix recipe

What we saw

Evidence

Fix recipe

What we saw

Evidence

Fix recipe

What we saw

Evidence

Fix recipe

What we saw

Evidence

Fix recipe

This was a real audit. Run yours.