Your orchestrator completes the full run successfully, yet the market_research_deepening metric reads 0.0. That gap between execution and evaluation is exactly what this sprint diagnoses — and fixes with artefacts you can deploy immediately.
Secured by PayPal · You will be redirected to complete payment
market_research_deepening is dropped, suppressed, or never emitted — with annotated call-graph excerpts and Pydantic schema diffs.
conftest.py + test case)
A self-contained pytest fixture that deterministically reproduces the 0.0 scoring condition against your codebase, so your team can regression-test any fix without manual orchestration runs.
Yes. The audit is architecture-agnostic and focuses on the data-flow contract between your orchestrator (LangGraph, Claude Code skills, or custom multi-agent), the worker subagent payload, and the scoring module. If you're using Pydantic schemas for output validation, the report will explicitly call out any missing or mis-typed market_research_deepening fields in those schemas.
The audit delivers an unambiguous determination either way. If the 0.0 score is intentional gating (e.g., a safety filter or minimum-confidence threshold), the report documents the exact gate condition, its threshold, and the recommended adjustment range — with a risk note for any threshold change. You'll have a written record either way.
You can anonymize the orchestrator graph definition and scoring module independently — the replay fixture accepts a structured input dict that mirrors your data shape without requiring your full production codebase. Milo will share a minimal test harness template first so you can see exactly what data shape is needed.
The sprint delivers the five artefacts as specified within 5 business days. If the diagnosis identifies a deeper structural issue that warrants a second engagement, Milo will flag it explicitly in the Propagation Audit Report with a recommended scope and timeline — so you can decide whether to proceed without ambiguity.