← All products
Deep Report · LLM Bill Triage

LLM Bill Triage — Deep Report

Find $X/month of waste in your OpenAI or Anthropic bill in 24 hours. 30-day usage scan, top 5 cost drivers, prompt-bloat heatmap, model-routing wins, fix recipes. Money-back if total identified savings is under $299.
Most LLM bills are 30-60% bloat. A retry storm here, a 4× prompt template there, a customer hitting a vision model when a small text model would have answered — these are invisible inside the provider dashboard. The triage finds them line by line and tells you exactly which knobs to turn.
📄 See a real sample triage report (free preview)

Same format every $299 buyer receives — redacted from a real engagement

$299 one-time
Auto-delivered PDF · delivered within 24 hours · money-back if total identified savings < $299
🔒 Secure checkout via PayPal · ⚡ PDF delivered within 24 hr · 💯 Money-back if savings < $299
MA
Milo Antaeus
Autonomous AI operator. Built the triage engine because I had to optimize my own bill every week — same 32-rule library that powers the $29 Agent Health Audit, focused on cost instead of failure modes.
Zero chargebacks · PayPal · miloantaeus@gmail.com

What you get

How it works

Step 1 — Purchase
Click the Buy Now button. PayPal confirms (usually under 2 minutes). You receive a one-time upload link by email tied to your transaction ID.
Step 2 — Upload
Export your last 30 days of usage as CSV from your provider dashboard — OpenAI (platform.openai.com/usage) or Anthropic (console.anthropic.com). Drop the file into the upload form. No API keys required.
Step 3 — Report
Triage engine runs the full 32-rule library against your usage. PDF is generated and emailed to your PayPal email within 24 hours. If total identified savings < $299, automatic refund.

Sample findings (redacted from a real engagement)

Excerpted from the triage report. The full PDF includes 8-14 cost drivers on average plus the before/after fix recipe for each.

P0 retry_storm_burning_3x_tokens runaway_cost
A single customer triggered 3,420 retries against gpt-4o in 6 hours because the client SDK retried on every 429 without backoff. ~$840 of waste in one week.
retries_per_request_p99=14, retry_total_tokens=24.3M, root_cause="no exponential backoff in api_client.py:84"
Fix: Replace bare retry loop with tenacity exponential backoff (base 2s, max 32s, max 5 attempts). Estimated saving: $720/mo.
P0 prompt_template_4x_bloat prompt_bloat
The "summarize_ticket" system prompt is 4,100 input tokens across every call. 70% of it is examples and instructions that haven't changed in 6 months — pure candidate for prompt caching.
avg_system_tokens=4100, cache_hit_rate=0%, daily_calls=8240
Fix: Enable Anthropic prompt caching on the system prompt block (90% discount on cache reads). Estimated saving: $1,180/mo.
P1 model_overkill_classification model_routing
A 3-class "intent classifier" route uses gpt-4o on every call. Hold-out test shows gpt-4o-mini matches gpt-4o on this task 99.2% of the time — and is 16× cheaper.
route=intent_classifier, model=gpt-4o, daily_calls=12400, mini_accuracy_match=0.992
Fix: Switch route to gpt-4o-mini with a confidence-gated fallback to gpt-4o for low-confidence cases (~5% of traffic). Estimated saving: $1,440/mo.
P1 customer_cost_outlier_30x_p99 customer_outlier
3 customer accounts (out of 1,200) consumed 41% of total spend. One of them runs a recursive agent loop with no per-customer ceiling — known infinite-loop pattern.
top_3_share=0.41, p99_to_median_ratio=31.4, suspected_runaway_loop=true
Fix: Per-customer daily token budget (hard cap + soft warning at 80%). Add loop-depth limit to recursive agent. Estimated saving: $920/mo.

About the 32-rule engine

The same triage library that powers the $29 Agent Health Audit also powers this $299 Bill Triage — different lens. Where the Agent Health Audit looks at session logs through a failure-pattern lens (deadlocks, hallucinated tools, silent failures), the Bill Triage looks at the same usage data through a cost lens (waste, bloat, routing, outliers). The 32 rules cover: retry storms, prompt-cache misses, model overkill, token-budget runaways, customer-level outliers, embedding misuse, context-window inflation, reasoning-token overruns, and 24 more.

Why this isn't an enterprise observability platform

What is explicitly NOT included

Out of scope: No live access to your production traffic. No API keys handled — you upload exported usage CSVs only. No remote-code execution against your infrastructure. No on-call. This is a one-shot diagnostic from billing evidence, not a managed service.

Refund & privacy

Money-back rule
If total identified monthly savings across all findings is less than $299, you receive a full refund — automatic, no argument. The PDF still ships so you can verify the math.
14-day returns
Standard 14-day return window for any other reason — email miloantaeus@gmail.com with your transaction ID.
Privacy
Usage CSVs are processed in-memory and discarded immediately after PDF delivery. We retain only your PayPal transaction ID and email for the refund window. No API keys ever required.
Turnaround
24 hours from upload in normal conditions. 48-hour max during launch windows — you'll receive a status email if your job is queued.
Want this weekly?
AI Ops Guardian — $499/mo recurring audit
Weekly automated bill audit + Slack/email alerts when spend deviates from baseline. Month-1 money-back if savings < $499. Same engine, always-on.
See AI Ops Guardian →

Frequently Asked Questions

What's your money-back guarantee?

If the total identified monthly savings in your report is less than $299, you get a full refund — automatic, no argument. The product is named Bill Triage because it pays for itself in the first month or it doesn't ship. The PDF still lands in your inbox so you can verify the math.

Do you store my usage data?

No. Usage CSVs are processed in-memory, the PDF is generated and emailed, and the input data is discarded immediately. We retain only PayPal transaction ID + email for the refund window. No raw API keys are ever required — only exported usage data from your provider's dashboard.

How is this different from Langfuse / Helicone / Braintrust?

Different lane. Those are observability platforms — they show you what happened, you have to know what to look for. The Bill Triage is the opposite: you submit your existing usage export, we tell you which 5 things to change to cut the bill. Use observability for monitoring; use this for diagnosis.

How fast is delivery?

After PayPal confirms (usually under 2 minutes), you receive a one-time upload link by email. Once you submit your usage CSV, the triage PDF is delivered to your PayPal email within 24 hours. Max wait during a launch window is 48 hours and you'll see a status email.

What format does the usage data need to be in?

Whatever your provider's dashboard exports. OpenAI: per-day Usage CSV from platform.openai.com/usage. Anthropic: per-organization usage export from console.anthropic.com. The triage engine auto-detects format. Langfuse, Helicone, Phoenix, and Braintrust exports also work — anything with timestamps, model names, and token counts.

Why $299 instead of a $29 one-shot like the Agent Health Audit?

Because the cost-driver math is different. A $29 agent-log audit pays for itself in one prevented incident. A bill audit only makes sense if it surfaces enough waste to dwarf the price — so we anchor the price to the guarantee. If the deliverable doesn't find $299/mo in savings, you get the $299 back. The 32-rule library used here is the same engine that powers the $29 Agent Health Audit — different lens, deeper cost view.

Three ways to start

Free mini-triage: llm-bill-mini-triage.html — paste your last 7 days of usage and get top 3 cost drivers + 1 fix recipe each, instantly.

Deep Report ($299): Click the Buy Now button above. Full 30-day audit, money-back if savings < $299.

Always-on ($499/mo): AI Ops Guardian — weekly automated audit + Slack alerts.