LIVE REAL-REPO RUN #2 — Same $79 analyzer, different public OSS repo: github.com/openai/openai-cookbook (275 files). The findings are real. Notice: this repo has WAY fewer findings than anthropic-cookbook ($560/mo here vs $4,673/mo there) — proof the analyzer doesn't manufacture findings to justify the $79. Compare with anthropic-cookbook run → · Order your own $79 X-Ray →

LLM Bill X-Ray · by Milo Antaeus

REAL X-Ray Sample: openai-cookbook

Static-analysis cost audit · https://github.com/openai/openai-cookbook · Generated 2026-05-16 21:12 UTC

Files scanned: 275 LLM call sites found: 7 Patterns checked: 9 Confidence: deterministic (no LLM-in-the-loop)

Executive summary

7 ranked cost leaks across 275 files. Implementing the top 3 could save approximately $560/month — $6,720/year.

#	Leak	Severity	$/mo saved
1	API call in try/except with no backoff — potential retry storm	MEDIUM	$120
2	API call in try/except with no backoff — potential retry storm	MEDIUM	$120
3	API call in try/except with no backoff — potential retry storm	MEDIUM	$120
4	4 hardcoded model strings without env-var indirection	MEDIUM	$50
5	4 hardcoded model strings without env-var indirection	MEDIUM	$50

TOTAL ESTIMATED MONTHLY SAVINGS: $560

Leak #1 — API call in try/except with no backoff — potential retry storm $120/mo

Confidence: 55% · Rule: retry_storm_no_backoff

MEDIUM

Where: examples/fine-tuned_qa/answers_with_ft.py:80

What we found: An LLM API call is wrapped in try/except but no backoff or sleep is detected anywhere in this file. On a transient outage, this loop can hammer the provider for as long as the wrapping loop runs, generating billable input tokens on every failed attempt. Add exponential backoff via `backoff` or `tenacity` library — or at minimum time.sleep(min(2**attempt, 30)).

Before (examples/fine-tuned_qa/answers_with_ft.py:80)

print("Context:\n" + context)
        print("\n\n")
    try:
        # fine-tuned models requires model parameter, whereas other models require engine parameter
        model_param = (
            {"model": fine_tuned_qa_model}
            if ":" in fine_tuned_qa_model

After

import backoff
@backoff.on_exception(backoff.expo, Exception, max_tries=4, max_time=60)
def call_with_retry(...):
    return client.messages.create(...)

Leak #2 — API call in try/except with no backoff — potential retry storm $120/mo

Confidence: 55% · Rule: retry_storm_no_backoff

MEDIUM

Where: examples/object_oriented_agentic_approach/resources/object_oriented_agents/utils/openai_util.py:31

Before (examples/object_oriented_agentic_approach/resources/object_oriented_agents/utils/openai_util.py:31)

kwargs["tools"] = tools

    try:
        response = openai_client.chat.completions.create(**kwargs)
        return response
    except Exception as e:
        logger.error(f"OpenAI call failed: {str(e)}")

After

import backoff
@backoff.on_exception(backoff.expo, Exception, max_tries=4, max_time=60)
def call_with_retry(...):
    return client.messages.create(...)

Leak #3 — API call in try/except with no backoff — potential retry storm $120/mo

Confidence: 55% · Rule: retry_storm_no_backoff

MEDIUM

Where: examples/object_oriented_agentic_approach/resources/object_oriented_agents/services/openai_language_model.py:44

Before (examples/object_oriented_agentic_approach/resources/object_oriented_agents/services/openai_language_model.py:44)

self.logger.debug("Generating completion with OpenAI model.")
        self.logger.debug(f"Request: {kwargs}")
        try:
            response = self.openai_client.chat.completions.create(**kwargs)
            self.logger.debug("Received response from OpenAI.")
            self.logger.debug(f"Response: {response}")
            return response

After

import backoff
@backoff.on_exception(backoff.expo, Exception, max_tries=4, max_time=60)
def call_with_retry(...):
    return client.messages.create(...)

Leak #4 — 4 hardcoded model strings without env-var indirection $50/mo

Confidence: 70% · Rule: hardcoded_model_no_env

MEDIUM

Where: examples/evals/realtime_evals/tests/test_dynamic_result_columns.py:97

What we found: Found 4 hardcoded model strings in this file, none routed through env vars. This blocks A/B testing cheaper models, prevents quick rollback when a vendor releases a better-priced equivalent, and forces a code deploy for every routing change. Introduce env vars (MODEL_PRIMARY, MODEL_RERANK, MODEL_BATCH).

Before (examples/evals/realtime_evals/tests/test_dynamic_result_columns.py:97)

model="assistant"

After

model=os.getenv("MODEL_PRIMARY", "assistant")

Leak #5 — 4 hardcoded model strings without env-var indirection $50/mo

Confidence: 70% · Rule: hardcoded_model_no_env

MEDIUM

Where: examples/utils/embeddings_utils.py:18

Before (examples/utils/embeddings_utils.py:18)

model="text-embedding-3-small"

After

model=os.getenv("MODEL_PRIMARY", "text-embedding-3-small")

Leak #6 — 4 hardcoded model strings without env-var indirection $50/mo

Confidence: 70% · Rule: hardcoded_model_no_env

MEDIUM

Where: examples/partners/agentic_governance_guide/promptfoo/promptfoo_target.py:78

Before (examples/partners/agentic_governance_guide/promptfoo/promptfoo_target.py:78)

model="gpt-5.2"

After

model=os.getenv("MODEL_PRIMARY", "gpt-5.2")

Leak #7 — 3 hardcoded model strings without env-var indirection $50/mo

Confidence: 70% · Rule: hardcoded_model_no_env

MEDIUM

Where: examples/voice_solutions/realtime_translation_guide/browser-translation-demo/test/server.test.js:114

What we found: Found 3 hardcoded model strings in this file, none routed through env vars. This blocks A/B testing cheaper models, prevents quick rollback when a vendor releases a better-priced equivalent, and forces a code deploy for every routing change. Introduce env vars (MODEL_PRIMARY, MODEL_RERANK, MODEL_BATCH).

Before (examples/voice_solutions/realtime_translation_guide/browser-translation-demo/test/server.test.js:114)

model="gpt-realtime-translate"

After

model=os.getenv("MODEL_PRIMARY", "gpt-realtime-translate")

Token-burn map

v1 audit does not include the per-call-site cost table shown in the public sample report — that requires uploading your billing CSV during intake (coming in v2). The findings above are based on static code analysis only, with estimated $/mo savings calibrated to mid-size SaaS workloads. If you'd like a calibrated cost table, email miloantaeus@gmail.com with your last 30-day billing CSV and we'll regenerate the report at no extra charge.

30-day re-audit voucher

Included with your $79 audit: a voucher for a free re-audit 30 days after delivery. Implement the recommended fixes, then re-submit the same repo URL via reply email — we re-run the analysis and quantify whether the savings materialized. If your LLM bill didn't drop by at least $79, refund issued automatically.

Why this matters: there's a strong vendor incentive to inflate projected savings. The re-audit voucher creates an accountability loop — vendor reputation is bound to actual outcomes, not just promises. If you implement 0 of the recommendations, that's on you. If you implement all of them and your bill goes up, we refund.