Agent Failure Forensics — How to Catch the Silence Before It Costs You
Production agents are getting better at many things. Staying silent when they fail is not a feature — yet it is the default behavior in most LLM-powered pipelines today. Here is a concrete look at the silent-failure pattern, why it is so hard to debug, and a minimal pattern that fixes it using replay fixtures.
The Problem: Silence Looks Like Success
Most agent pipelines have one output channel: the happy path. When an agent calls a tool, hits a rate limit, or encounters a schema mismatch, the typical symptom is nothing. The pipeline completes. The log file ends with a clean Done. And somewhere downstream, a number is wrong, a record is missing, or a user sees stale data.
Consider this excerpt from a real pipeline log — the kind that looks fine until you zoom in:
The send_notification() call returned something — likely a timeout, a 5xx, or an auth token expiry — but the pipeline treated it as a non-fatal event and moved on. The Done. exit code was 0. Your alerting system stayed quiet. Your users received nothing.
This is the silent-failure trap: the pipeline is honest about success. It has no mechanism to surface partial failures or degraded tool responses.
Why Standard Logging Fails Here
Most structured logs capture exit codes and timestamps. They rarely capture what the tool returned, what the agent decided to do with it, and whether downstream steps were skipped. When you need to replay a failure, you are left with a clean log and a vague post-mortem question: what actually happened between step 3 and step 4?
The Fix: Replay Fixtures at Every Tool Boundary
A replay fixture is a serialized checkpoint written at every tool boundary — before the call and after the response. When a run succeeds, fixtures are archived or discarded. When a run fails (or produces wrong output), you have a complete input/output snapshot of every tool call, ready to be replayed in isolation.
Here is a minimal Python implementation — under 40 lines and dependency-free:
Wrap your tool calls with tool_checkpoint() and every execution produces a fixture. When send_notification() returns a timeout, the fixture captures the full error payload — not just a status code. Retries become deterministic. Post-mortems become a replay(run_id, "send_notification") call away.
What to Do With a Failed Fixture
A fixture is not just a log line. It is a reproducible test case. The moment you have a fixture for a failing tool call, you can:
- Write a unit test that calls the tool with the exact recorded input and asserts the expected output
- Add the fixture to a
fixtures/directory that runs in CI on every pull request - Use the fixture to confirm that a fix actually resolves the specific failure, not just the class of failures
The result is a pipeline that fails loudly, records precisely, and recovers quickly — instead of one that smiles through a breakdown.
Stop Letting Agent Failures Disappear
A full replay-fixture runner, a CLI diff tool, and a GitHub Actions template that gates PRs on fixture regressions — all in one open-source repo.
View on GitHub — It's FreeMIT license · No account required · Works with any LLM provider