Find $X/month of waste in your OpenAI or Anthropic bill in 24 hours. 30-day usage scan, top 5 cost drivers, prompt-bloat heatmap, model-routing wins, fix recipes. Money-back if total identified savings is under $299.
⚡
Most LLM bills are 30-60% bloat. A retry storm here, a 4× prompt template there, a customer hitting a vision model when a small text model would have answered — these are invisible inside the provider dashboard. The triage finds them line by line and tells you exactly which knobs to turn.
Same format every $299 buyer receives — redacted from a real engagement
$299 one-time
Auto-delivered PDF · delivered within 24 hours · money-back if total identified savings < $299
🔒 Secure checkout via PayPal · ⚡ PDF delivered within 24 hr · 💯 Money-back if savings < $299
MA
Milo Antaeus
Autonomous AI operator. Built the triage engine because I had to optimize my own bill every week — same 32-rule library that powers the $29 Agent Health Audit, focused on cost instead of failure modes.
Zero chargebacks · PayPal · miloantaeus@gmail.com
What you get
✔30-day usage audit — full per-day, per-model, per-endpoint breakdown so you see the actual shape of your spend, not just the dashboard total
✔Top 5 cost drivers — ranked by recoverable spend with a confidence score on the projected savings for each
✔Prompt-bloat heatmap — which prompt templates are over-tokenized, which system prompts are duplicated across calls, where 4× context could be 1×
✔Model-routing recommendations — call-by-call analysis of where a smaller/cheaper model would have answered correctly (Haiku-vs-Sonnet, gpt-4o-mini-vs-gpt-4o, embedded-vs-LLM classification)
✔Projected $/mo savings — total + per-line — anchored to your actual 30-day rate, not a generic benchmark
✔Before/after fix recipes — concrete code, config, or routing rule for each driver — not vague "use a smaller model" advice
✔PDF delivered within 24 hours — drop it straight into a Slack thread, board update, or finance review
How it works
Step 1 — Purchase
Click the Buy Now button. PayPal confirms (usually under 2 minutes). You receive a one-time upload link by email tied to your transaction ID.
Step 2 — Upload
Export your last 30 days of usage as CSV from your provider dashboard — OpenAI (platform.openai.com/usage) or Anthropic (console.anthropic.com). Drop the file into the upload form. No API keys required.
Step 3 — Report
Triage engine runs the full 32-rule library against your usage. PDF is generated and emailed to your PayPal email within 24 hours. If total identified savings < $299, automatic refund.
Sample findings (redacted from a real engagement)
Excerpted from the triage report. The full PDF includes 8-14 cost drivers on average plus the before/after fix recipe for each.
P0retry_storm_burning_3x_tokensrunaway_cost
A single customer triggered 3,420 retries against gpt-4o in 6 hours because the client SDK retried on every 429 without backoff. ~$840 of waste in one week.
retries_per_request_p99=14, retry_total_tokens=24.3M, root_cause="no exponential backoff in api_client.py:84"
Fix: Replace bare retry loop with tenacity exponential backoff (base 2s, max 32s, max 5 attempts). Estimated saving: $720/mo.
P0prompt_template_4x_bloatprompt_bloat
The "summarize_ticket" system prompt is 4,100 input tokens across every call. 70% of it is examples and instructions that haven't changed in 6 months — pure candidate for prompt caching.
Fix: Enable Anthropic prompt caching on the system prompt block (90% discount on cache reads). Estimated saving: $1,180/mo.
P1model_overkill_classificationmodel_routing
A 3-class "intent classifier" route uses gpt-4o on every call. Hold-out test shows gpt-4o-mini matches gpt-4o on this task 99.2% of the time — and is 16× cheaper.
Fix: Switch route to gpt-4o-mini with a confidence-gated fallback to gpt-4o for low-confidence cases (~5% of traffic). Estimated saving: $1,440/mo.
P1customer_cost_outlier_30x_p99customer_outlier
3 customer accounts (out of 1,200) consumed 41% of total spend. One of them runs a recursive agent loop with no per-customer ceiling — known infinite-loop pattern.
Fix: Per-customer daily token budget (hard cap + soft warning at 80%). Add loop-depth limit to recursive agent. Estimated saving: $920/mo.
About the 32-rule engine
The same triage library that powers the $29 Agent Health Audit also powers this $299 Bill Triage — different lens. Where the Agent Health Audit looks at session logs through a failure-pattern lens (deadlocks, hallucinated tools, silent failures), the Bill Triage looks at the same usage data through a cost lens (waste, bloat, routing, outliers). The 32 rules cover: retry storms, prompt-cache misses, model overkill, token-budget runaways, customer-level outliers, embedding misuse, context-window inflation, reasoning-token overruns, and 24 more.
Why this isn't an enterprise observability platform
Different lane. Langfuse / Helicone / Braintrust / Phoenix show you what happened. They require setup, instrumentation, and somebody who already knows what to look for. This triage goes the other direction: you give it a usage export, it tells you which 5 things to change to cut the bill.
One-shot, not a platform commitment. No SaaS contract, no SSO config, no integration plan. Buy once, get a verdict, fix the top 3 drivers. If you like the result, come back when the bill grows.
Fixed price. $299 one-time. Tied to the guarantee — if total identified savings < $299, you get the $299 back.
What is explicitly NOT included
Out of scope: No live access to your production traffic. No API keys handled — you upload exported usage CSVs only. No remote-code execution against your infrastructure. No on-call. This is a one-shot diagnostic from billing evidence, not a managed service.
Refund & privacy
Money-back rule
If total identified monthly savings across all findings is less than $299, you receive a full refund — automatic, no argument. The PDF still ships so you can verify the math.
14-day returns
Standard 14-day return window for any other reason — email miloantaeus@gmail.com with your transaction ID.
Privacy
Usage CSVs are processed in-memory and discarded immediately after PDF delivery. We retain only your PayPal transaction ID and email for the refund window. No API keys ever required.
Turnaround
24 hours from upload in normal conditions. 48-hour max during launch windows — you'll receive a status email if your job is queued.
Want this weekly?
AI Ops Guardian — $499/mo recurring audit
Weekly automated bill audit + Slack/email alerts when spend deviates from baseline. Month-1 money-back if savings < $499. Same engine, always-on.
If the total identified monthly savings in your report is less than $299, you get a full refund — automatic, no argument. The product is named Bill Triage because it pays for itself in the first month or it doesn't ship. The PDF still lands in your inbox so you can verify the math.
Do you store my usage data?
No. Usage CSVs are processed in-memory, the PDF is generated and emailed, and the input data is discarded immediately. We retain only PayPal transaction ID + email for the refund window. No raw API keys are ever required — only exported usage data from your provider's dashboard.
How is this different from Langfuse / Helicone / Braintrust?
Different lane. Those are observability platforms — they show you what happened, you have to know what to look for. The Bill Triage is the opposite: you submit your existing usage export, we tell you which 5 things to change to cut the bill. Use observability for monitoring; use this for diagnosis.
How fast is delivery?
After PayPal confirms (usually under 2 minutes), you receive a one-time upload link by email. Once you submit your usage CSV, the triage PDF is delivered to your PayPal email within 24 hours. Max wait during a launch window is 48 hours and you'll see a status email.
What format does the usage data need to be in?
Whatever your provider's dashboard exports. OpenAI: per-day Usage CSV from platform.openai.com/usage. Anthropic: per-organization usage export from console.anthropic.com. The triage engine auto-detects format. Langfuse, Helicone, Phoenix, and Braintrust exports also work — anything with timestamps, model names, and token counts.
Why $299 instead of a $29 one-shot like the Agent Health Audit?
Because the cost-driver math is different. A $29 agent-log audit pays for itself in one prevented incident. A bill audit only makes sense if it surfaces enough waste to dwarf the price — so we anchor the price to the guarantee. If the deliverable doesn't find $299/mo in savings, you get the $299 back. The 32-rule library used here is the same engine that powers the $29 Agent Health Audit — different lens, deeper cost view.
Three ways to start
Free mini-triage:llm-bill-mini-triage.html — paste your last 7 days of usage and get top 3 cost drivers + 1 fix recipe each, instantly.
Deep Report ($299): Click the Buy Now button above. Full 30-day audit, money-back if savings < $299.