Milo Antaeus

Anthropic Claude cache_control Missing—How to Find It

2026-05-17 LLM Cost Optimization 8 min read

Anthropic's cache_control parameter is one of the most powerful cost-saving features in the Claude API. When used correctly, it can reduce input token costs by up to 90% for workloads with repeated context. But here's the problem: it's easy to accidentally leave it out, and when you do, you're paying full price for every request.

This guide shows you how to detect when cache_control is missing from your Claude API calls and what to do about it.

What is cache_control and Why Does It Matter?

The cache_control parameter tells Anthropic to cache certain parts of your input (typically your system prompt or other static context). On subsequent requests with the same cached content, you get charged only for the new tokens—not the cached ones.

For example, if you have a 1000-token system prompt and you're making 1000 requests per day with it, that's 1 million prompt tokens per day. With effective caching, you might pay for only 1000 tokens of overhead plus 1000 × your actual new content per day.

Without caching, you're paying for all 1 million prompt tokens every single day.

Why cache_control Gets Left Out

1. SDK Changes
The parameter has moved around across SDK versions. Code that worked six months ago might be using an outdated API pattern.

2. Abstraction Layers
If you're using a wrapper library or an AI orchestration tool, it may not expose cache_control directly, or it might use different parameter names.

3. Different Providers
If your codebase uses both OpenAI and Anthropic, it's easy to use the OpenAI pattern everywhere and miss the Anthropic-specific caching feature.

4. Dynamic Prompts
Developers assume that because their prompts are dynamic, caching won't help. But even partial caching (system prompt + instruction prefix) can yield significant savings.

How to Find Missing cache_control in Your Codebase

Step 1: Search for Anthropic API Calls

grep -r "anthropic\|claude\|messages.create" --include="*.py" --include="*.js" --include="*.ts" .

Find every place you're calling the Claude API.

Step 2: Check Each Call for cache_control

Look at each API call and check if the cache_control parameter is present. Specifically, look for:

Step 3: Verify the Cache Headers Are Actually Being Sent

Having cache_control in your code isn't enough—it needs to be formatted correctly. In the Anthropic SDK, you typically use it like this:

from anthropic import Anthropic
client = Anthropic()

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "Your system prompt here",
            "cache_control": {"type": "ephemeral"}  # This is the key part
        }
    ],
    messages=[{"role": "user", "content": "Hello"}]
)

If you're not seeing cache_control in your system message blocks, caching isn't active.

Step 4: Instrument for Cache Hit Rate

Even if you think caching is configured, you need to verify it's actually working. Add logging to track cache hit rates:

# After each API call, check the usage stats
if hasattr(response, 'usage'):
    cache_attempted = response.usage.input_tokens
    logger.info(f"Input tokens: {cache_attempted}, Cache: {'active' if cache_attempted < expected_system_tokens else 'likely_miss'}")

A cache hit should show significantly fewer input tokens than your known system prompt length.

Common Mistakes and How to Fix Them

Mistake 1: Using cache_control on the Wrong Block

cache_control must be on individual content blocks, not on the top-level request. Make sure it's inside your system list or messages list items.

Mistake 2: Setting Wrong Cache Type

The only supported type is "ephemeral" which means the cache lasts for the session. Don't try to use persistent caching—it's not supported.

Mistake 3: Not Accounting for Cache Expiry

The ephemeral cache has a time limit and a token limit. If you're sending too many tokens through the cache, older entries get evicted. Monitor this and refresh your context if needed.

Mistake 4: Cache Control on Dynamic Content

If your system prompt changes every request, caching won't help. Separate static instructions (cache them) from dynamic content (don't).

Automating Detection Across Your Codebase

If you have multiple Claude integration points, manually checking each one doesn't scale. You need automated detection.

Anthropic Prompt Library Audit can automatically identify Claude API calls in your infrastructure, detect whether cache_control is properly configured, and alert you when high-volume calls are missing cache optimization. This turns a tedious manual audit into automated monitoring.

For teams running significant Claude workloads, ensuring cache_control is properly implemented across all endpoints is one of the highest-ROI optimizations you can make. Explore the full audit suite.