The audit ran the 32-rule cost library against the 30-day usage CSV. Of the $14,247 in monthly spend, $4,890 was traced to five concentrated cost drivers with concrete, ship-today fixes. Each driver below includes a fix recipe with effort estimate and ROI multiplier.
Ranked by monthly spend. Percentages are of total $14,247 monthly bill.
| Rank | Cost driver | Monthly spend | % of total | Fix |
|---|---|---|---|---|
| 1 |
Claude Opus 4.1 used for JSON extraction
Schema-extraction calls routed to Opus where Haiku would match quality
|
$4,820 | 33.8% | Route to Haiku — saves $4,113/mo (85% reduction) |
| 2 |
Retry storm on tool-call errors
3–10× replays per failed tool call with no backoff cap
|
$2,340 | 16.4% | Exponential backoff, cap at 2 retries — saves $1,560/mo |
| 3 |
Long system prompts re-sent every request
4,200-token system prompt on every call, zero prompt-cache hits
|
$1,890 | 13.3% | Enable prompt caching — saves $1,134/mo (60%) |
| 4 |
Embedding re-indexing every hour
Full corpus re-embed 24×/day where nightly would suffice
|
$1,210 | 8.5% | Move to nightly re-index — saves $1,068/mo (88%) |
| 5 |
GPT-4o for tasks Gemini Flash handles
Classification + short-answer routes where Flash matches accuracy at lower cost
|
$890 | 6.2% | Route to Gemini 1.5 Flash — saves $623/mo (70%) |
| — |
All other drivers
27 remaining rule matches, individually under $300/mo each
|
$3,097 | 21.7% | Documented in the appendix for future review |
| TOTAL MONTHLY SPEND | $14,247 | 100% | $4,890/mo recoverable identified | |
Top 10 prompts ranked by token-count × frequency. Red > 10M tokens/day, yellow 5–10M, green < 5M.
The two red prompts (prompt-a3f29e at 4,247 tokens × 2,847 calls/day and prompt-b81d44 at 3,640 tokens × 2,802 calls/day) account for 60% of all input-token spend. Both are prime candidates for prompt caching — see Fix #3.
Where a cheaper model matches output quality. Each recommendation is backed by a 50-case hold-out evaluation included in the appendix.
| Current model | Use case | Suggested model | Reasoning |
|---|---|---|---|
| claude-opus-4-1 | JSON schema extraction | claude-haiku-4-5 | Schema-extraction quality matches at 1/15 cost. 50-case eval: 100% schema-valid output, 96% field-accuracy match. |
| gpt-4o | Customer-support intent classification | gemini-1.5-flash | 50-case eval shows identical accuracy at 1/8 cost. Latency is comparable. |
| gpt-4o | Short-answer Q&A | gemini-1.5-flash | Hold-out shows Flash matches GPT-4o on factual short-form responses with no quality regression. |
| claude-sonnet-4 | Long-document summarization | claude-sonnet-4 | Keep — best quality-per-dollar for this workload. Downgrading to Haiku lost summary fidelity in eval. |
| claude-opus-4-1 | Complex multi-step reasoning | claude-opus-4-1 | Keep — Opus is the right tool for this lane. Downgrading regresses on reasoning depth. |
Ordered by ROI (savings ÷ eng-hours). Each recipe includes the concrete code or config change required.
The audit ran the 32-rule cost library against your usage CSV in-memory. Each rule looked for a specific cost pattern: retry storms, prompt bloat, model mismatch, embedding waste, context-window inflation, reasoning-token overruns, customer-level outliers, cache-miss patterns, and 24 more. Findings were ranked by $/month impact and de-duplicated where two rules flagged the same underlying cause.
Fix recipes include ROI estimates assuming the engineering hours noted. ROI = (monthly savings × 12 months) ÷ (eng-hours × $250/hour blended rate). Quality match for routing recommendations is backed by a 50-case hold-out evaluation against a held-back slice of your actual production data. The full eval CSVs are included in the appendix.
Your usage data was processed in-memory, the report was generated, and the input was discarded immediately. No raw API keys were requested. The retained record is your PayPal transaction ID and email, kept only for the refund window.
Upload your last 30 days of OpenAI, Anthropic, or Gemini usage CSV and get the same report format — with your real spend, your real prompts, your real fix recipes — within 24 hours.
Get the $299 Deep Report →