Static-analysis cost audit · https://github.com/openai/openai-cookbook · Generated 2026-05-16 21:12 UTC
7 ranked cost leaks across 275 files. Implementing the top 3 could save approximately $560/month — $6,720/year.
| # | Leak | Severity | $/mo saved |
|---|---|---|---|
| 1 | API call in try/except with no backoff — potential retry storm | MEDIUM | $120 |
| 2 | API call in try/except with no backoff — potential retry storm | MEDIUM | $120 |
| 3 | API call in try/except with no backoff — potential retry storm | MEDIUM | $120 |
| 4 | 4 hardcoded model strings without env-var indirection | MEDIUM | $50 |
| 5 | 4 hardcoded model strings without env-var indirection | MEDIUM | $50 |
Where: examples/fine-tuned_qa/answers_with_ft.py:80
What we found: An LLM API call is wrapped in try/except but no backoff or sleep is detected anywhere in this file. On a transient outage, this loop can hammer the provider for as long as the wrapping loop runs, generating billable input tokens on every failed attempt. Add exponential backoff via `backoff` or `tenacity` library — or at minimum time.sleep(min(2**attempt, 30)).
print("Context:\n" + context)
print("\n\n")
try:
# fine-tuned models requires model parameter, whereas other models require engine parameter
model_param = (
{"model": fine_tuned_qa_model}
if ":" in fine_tuned_qa_model
import backoff
@backoff.on_exception(backoff.expo, Exception, max_tries=4, max_time=60)
def call_with_retry(...):
return client.messages.create(...)
Where: examples/object_oriented_agentic_approach/resources/object_oriented_agents/utils/openai_util.py:31
What we found: An LLM API call is wrapped in try/except but no backoff or sleep is detected anywhere in this file. On a transient outage, this loop can hammer the provider for as long as the wrapping loop runs, generating billable input tokens on every failed attempt. Add exponential backoff via `backoff` or `tenacity` library — or at minimum time.sleep(min(2**attempt, 30)).
kwargs["tools"] = tools
try:
response = openai_client.chat.completions.create(**kwargs)
return response
except Exception as e:
logger.error(f"OpenAI call failed: {str(e)}")
import backoff
@backoff.on_exception(backoff.expo, Exception, max_tries=4, max_time=60)
def call_with_retry(...):
return client.messages.create(...)
Where: examples/object_oriented_agentic_approach/resources/object_oriented_agents/services/openai_language_model.py:44
What we found: An LLM API call is wrapped in try/except but no backoff or sleep is detected anywhere in this file. On a transient outage, this loop can hammer the provider for as long as the wrapping loop runs, generating billable input tokens on every failed attempt. Add exponential backoff via `backoff` or `tenacity` library — or at minimum time.sleep(min(2**attempt, 30)).
self.logger.debug("Generating completion with OpenAI model.")
self.logger.debug(f"Request: {kwargs}")
try:
response = self.openai_client.chat.completions.create(**kwargs)
self.logger.debug("Received response from OpenAI.")
self.logger.debug(f"Response: {response}")
return response
import backoff
@backoff.on_exception(backoff.expo, Exception, max_tries=4, max_time=60)
def call_with_retry(...):
return client.messages.create(...)
Where: examples/evals/realtime_evals/tests/test_dynamic_result_columns.py:97
What we found: Found 4 hardcoded model strings in this file, none routed through env vars. This blocks A/B testing cheaper models, prevents quick rollback when a vendor releases a better-priced equivalent, and forces a code deploy for every routing change. Introduce env vars (MODEL_PRIMARY, MODEL_RERANK, MODEL_BATCH).
model="assistant"
model=os.getenv("MODEL_PRIMARY", "assistant")
Where: examples/utils/embeddings_utils.py:18
What we found: Found 4 hardcoded model strings in this file, none routed through env vars. This blocks A/B testing cheaper models, prevents quick rollback when a vendor releases a better-priced equivalent, and forces a code deploy for every routing change. Introduce env vars (MODEL_PRIMARY, MODEL_RERANK, MODEL_BATCH).
model="text-embedding-3-small"
model=os.getenv("MODEL_PRIMARY", "text-embedding-3-small")
Where: examples/partners/agentic_governance_guide/promptfoo/promptfoo_target.py:78
What we found: Found 4 hardcoded model strings in this file, none routed through env vars. This blocks A/B testing cheaper models, prevents quick rollback when a vendor releases a better-priced equivalent, and forces a code deploy for every routing change. Introduce env vars (MODEL_PRIMARY, MODEL_RERANK, MODEL_BATCH).
model="gpt-5.2"
model=os.getenv("MODEL_PRIMARY", "gpt-5.2")
Where: examples/voice_solutions/realtime_translation_guide/browser-translation-demo/test/server.test.js:114
What we found: Found 3 hardcoded model strings in this file, none routed through env vars. This blocks A/B testing cheaper models, prevents quick rollback when a vendor releases a better-priced equivalent, and forces a code deploy for every routing change. Introduce env vars (MODEL_PRIMARY, MODEL_RERANK, MODEL_BATCH).
model="gpt-realtime-translate"
model=os.getenv("MODEL_PRIMARY", "gpt-realtime-translate")
v1 audit does not include the per-call-site cost table shown in the public sample report — that requires uploading your billing CSV during intake (coming in v2). The findings above are based on static code analysis only, with estimated $/mo savings calibrated to mid-size SaaS workloads. If you'd like a calibrated cost table, email miloantaeus@gmail.com with your last 30-day billing CSV and we'll regenerate the report at no extra charge.
Why this matters: there's a strong vendor incentive to inflate projected savings. The re-audit voucher creates an accountability loop — vendor reputation is bound to actual outcomes, not just promises. If you implement 0 of the recommendations, that's on you. If you implement all of them and your bill goes up, we refund.