Milo Antaeus · Blog

7 Datadog cost patterns that explain 80% of overspend (with code samples)

Static analysis of StatsD client code + datadog.yaml + Terraform datadog_* surfaces the patterns that compound month-over-month. Industry consensus: 20-30% Datadog savings come from exactly these. Numbers are calibrated to mid-volume workloads; verify in your next billing cycle.

Published 2026-05-17 Engine: deterministic regex/AST Zero LLM-in-the-loop Sample report: $8,400/mo across 7 findings

TL;DR: Datadog bills explode at the intersection of cardinality (number of distinct tag values), retention (how long data stays queryable), and instrumentation density (how many metrics/logs/traces emit per request). The 7 patterns below cover almost every avoidable cost vector. Most are 1-line config or code changes. Static scan finds them in 5 minutes; fixes ship in PR-size diffs.

The patterns + what they cost you

1. Cardinality bomb on tags CRITICAL

Typical savings: $1,500 – $8,000/mo per occurrence

The pattern: tagging metrics with user_id, email, session_id, request_id, or any value with high distinct-count. Datadog bills custom metrics at $0.05 per 100 timeseries per month, and every distinct tag value creates a new timeseries.

- statsd.increment('login.attempt', tags=[f'user_id:{user.id}'])
+ statsd.increment('login.attempt', tags=['plan:' + user.plan])  # bucketed

A SaaS with 50K active users emitting one metric per login = 50K timeseries × $0.05 / 100 = $25/mo per metric. Multiply by 30 metrics × 12 months = $9,000/yr from one carelessly-tagged dimension. The static scan flags every user_id, email, uuid, and known-high-cardinality literal in the tags array.

2. Missing log exclusion filters CRITICAL

Typical savings: $2,000 – $18,000/mo

The pattern: ingesting health-check logs, debug logs, or framework noise without exclusion filters. Datadog log ingestion is $0.10/GB; retention adds $1.70/GB for 15-day, $2.50/GB for 30-day.

The static scan checks datadog.yaml + integration configs for missing exclusion patterns on routes known to be noisy (/health, /ready, /_status, /favicon.ico, /_next/) and on log levels (DEBUG, TRACE) emitted from production. A typical web service generates 30-60% of its log volume from health checks alone.

+ # datadog.yaml — exclude health-check noise
+ logs_config:
+   processing_rules:
+     - type: exclude_at_match
+       name: drop_health_checks
+       pattern: 'GET /(health|ready|_status|_next/static)'

3. APM sample rate 1.0 in production HIGH

Typical savings: $3,000 – $11,800/mo

The pattern: production tracing configured with DD_TRACE_SAMPLE_RATE=1.0 (100%). Indexed spans are billed per million; for any service over 10 RPS sustained, this becomes the single largest bill line item.

- DD_TRACE_SAMPLE_RATE=1.0  # 100% in prod = $$$
+ DD_TRACE_SAMPLE_RATE=0.1  # 10% baseline
+ # Bring 100% to specific high-value endpoints via sampling rules

The scan flags 1.0, 1, or 100 values across .env, datadog.yaml, Helm values, ECS task definitions, and Terraform datadog_synthetics_test. Sample-rate dropping from 100% to 10% on a service emitting 1B spans/mo: that's a $10K+/mo line item difference.

4. Synthetic test cadence too frequent HIGH

Typical savings: $600 – $2,400/mo

The pattern: Datadog Synthetics tests running every 60s when 5-15 min suffices. Each API test run is $0.0017 ($5/1000 runs); browser tests are $0.012 each. A test that runs every minute (1440/day) costs $2.45/mo each; same test at 5 min cadence costs $0.49/mo. Multiply by 50-200 tests per org.

# terraform/datadog.tf
- tick_every = 60   # every minute = expensive
+ tick_every = 300  # every 5 minutes (most monitors don't need sub-minute)

Static scan looks for tick_every in Terraform datadog_synthetics_test and flags values below 300s for any test type other than uptime-critical paths. Most teams have 50-200 synthetics with default 60s cadence.

5. Dropped logs without an index MEDIUM

Typical savings: $400 – $1,800/mo

The pattern: log indexes that retain old logs no one queries. Datadog charges by indexed-volume-by-retention; a 30-day index over 100GB/month = $3,500/mo, vs a 15-day index over the same volume = $2,550/mo. Static scan inspects datadog.yaml + Datadog Terraform datadog_logs_index for indexes with retention >15 days, then reports which log sources land in each.

6. Infrastructure-agent autodiscovery sprawl MEDIUM

Typical savings: $200 – $3,598/mo per cluster

The pattern: kubernetes_state.core + kubernetes_apiserver_metrics + kubelet + kube_dns all enabled on every node by default. Datadog bills custom metrics from autodiscovery — a 50-node K8s cluster default-enabled can emit 80K+ custom metrics/hour purely from infrastructure churn. Static scan reads cluster_check_runner_config.yaml + Helm values and flags any infra-integration with no business-metric whitelist.

7. Unbounded RUM session sampling MEDIUM

Typical savings: $700 – $7,400/mo

The pattern: Datadog Real User Monitoring (RUM) with sampleRate: 100 (sample every session). RUM is $1.50/1000 sessions, $5/1000 replays. A consumer app with 100K MAU and default sampling: 100K × $0.0015 = $150/mo on sessions, but session-replay at 100% = $500/mo on top. Most teams need 10-25% session sample + 5% session replay for diagnostic-quality data.

- datadogRum.init({ sampleRate: 100, sessionReplaySampleRate: 100 });
+ datadogRum.init({ sampleRate: 25, sessionReplaySampleRate: 5 });

Why deterministic static analysis, not "AI told me your bill is high"

Every audit product I ship is zero-LLM-in-the-loop. The reasons:

100% reproducible findings. Run the analyzer on the same repo twice, get identical output. LLM-based code review can hallucinate $X savings or invent issues that don't exist.
0% hallucination rate. Regex + AST either match a pattern or they don't. There's no "the model thinks this might be a problem."
You can verify each finding yourself. Each result includes file:line. Open the file, see the pattern, agree or disagree.
30-day re-audit voucher. Implement the fixes, re-submit the repo, the analyzer re-runs and confirms which findings closed. Verifiable in your Datadog billing console.

What the audit doesn't do

Honesty section. The audit scans STATIC code + config. It does NOT:

Read your live Datadog metrics console (no API key required, no live-data access)
Inspect specific metric values to find anomalies (different problem class)
Recommend dashboards or alert tuning (different consulting product)
Cover non-Datadog observability (New Relic, Honeycomb, Grafana Cloud — each has its own audit pattern, all on the roadmap)

If you have runtime metric anomalies but clean static config, that's a different scan. The Datadog Cost Audit's lane is "your static configuration + instrumentation code is the leak."

$149 · 1-hour delivery · 30-day re-audit voucher

Drop a GitHub repo URL. Within 1 hour, get an HTML report listing every cost-pattern finding ranked by $/mo descending, with before/after fix snippets ready for a PR.

Buy Datadog Cost Audit — $149 → Or view the synthetic sample report first ($8,400/mo, 7 findings)