LIVE REAL-REPO RUN — CLEAN RESULT · Ran the $39 Anthropic Prompt Library Audit analyzer against github.com/anthropics/anthropic-cookbook (99 files). Only 1 finding — and it's MEDIUM severity. This is what an actually-clean prompt library looks like. Compare with X-Ray demo on same repo (18 findings, $4,673/mo from a different angle). Same engine, same repo, very different findings — proves the analyzer doesn't manufacture findings. Order your own $39 audit →
SHARE THIS HONEST DEMO
Share on X Share on LinkedIn Share on Reddit
Anthropic Prompt Library Audit · by Milo Antaeus

Your Anthropic Prompt Library Audit Report

Static-analysis prompt-caching audit · https://github.com/anthropics/anthropic-cookbook · Generated 2026-05-16 22:03 UTC

Files scanned: 99 Anthropic call sites: 12 Patterns checked: 4 Confidence: deterministic (no LLM-in-the-loop)

Executive summary

1 ranked Anthropic prompt-caching opportunities across 12 Anthropic call site(s) (99 total files scanned). Implementing the top 1 could save approximately $0/month$0/year.

Note: estimates use Sonnet rates ($3/M input, $0.30/M cache-read = 90% off) calibrated to mid-volume workloads (50K-500K calls/month). Verify in your next billing cycle.

#OpportunitySeverity$/mo saved
1Multi-paragraph system prompt duplicated across 1 filesHIGH$0
TOTAL ESTIMATED MONTHLY SAVINGS: $0

Opportunity #1 — Multi-paragraph system prompt duplicated across 1 files $0/mo

Confidence: 80% · Rule: system_prompt_duplicated_across_files
HIGH

Where: capabilities/retrieval_augmented_generation/evaluation/prompts.py:29

What we found: Found the same multi-paragraph system prompt (length 509 chars) repeated in 1 files: anchor at capabilities/retrieval_augmented_generation/evaluation/prompts.py, also in capabilities/retrieval_augmented_generation/evaluation/prompts.py, capabilities/retrieval_augmented_generation/evaluation/prompts.py. Two problems: (1) any edit to the canonical wording must be made in N places, (2) each copy gets cached independently in Anthropic's prompt cache — so the 5-minute prefix cache warmth from one call site does NOT help another. Extract to a single module-level constant (or a shared prompts/ directory), import it everywhere, and the cache becomes shared across all call sites.

Before (capabilities/retrieval_augmented_generation/evaluation/prompts.py:29)

# capabilities/retrieval_augmented_generation/evaluation/prompts.py:29
SYSTEM_PROMPT = """
    You have been tasked with helping us to answer the following query:
    <query>
    {input_query}
    </query>
    You have access to the following documents which are meant to provide context as..."""
# (and duplicated in 0 other file(s))

After

# prompts/system.py (single source of truth):
SYSTEM_PROMPT = """...the canonical multi-paragraph prompt..."""

# all call sites:
from prompts.system import SYSTEM_PROMPT
client.messages.create(
    system=[{"type": "text", "text": SYSTEM_PROMPT,
             "cache_control": {"type": "ephemeral"}}],
    ...
)

How Anthropic prompt caching works

Anthropic's prompt cache lets you mark static portions of your prompt (system instructions, retrieval context, few-shot examples) with cache_control: {"type": "ephemeral"}. The first call writes the cache (1.25x base input rate); subsequent calls within ~5 minutes read from the cache at 0.1x base rate (a 90% discount). For Sonnet, that's $0.30/M cached tokens vs $3/M base.

The math gets dramatic at moderate scale: a 4K-token system prompt called 100K times/month costs $1,200 uncached vs $150 cached ($120 reads + ~$30 amortized writes). That's $1,050/month saved on a single block — and most production workloads have 3-5 such blocks.

Cache scope: the cache key is the entire prefix up to (and including) the last cache_control marker. So order matters: put the most-static content first, then less-static, then the per-call variable content last. Anthropic supports up to 4 cache breakpoints per request.

30-day re-audit voucher

Included with your $39 audit: a voucher for a free re-audit 30 days after delivery. Implement the recommended cache_control changes, then re-submit the same repo URL via reply email — we re-run the analysis and confirm the cacheable blocks are now wrapped. If we still flag any of the CRITICAL findings from this report, refund issued automatically.

Why this matters: Anthropic cache savings only materialize once the code change ships. The re-audit voucher creates an accountability loop — we can't claim "issue resolved" unless the v1 ruleset agrees on re-scan. Same deterministic engine, same file paths, same line numbers. No moving goalposts.