Static analysis of StatsD client code + datadog.yaml + Terraform datadog_* surfaces the patterns that compound month-over-month. Industry consensus: 20-30% Datadog savings come from exactly these. Numbers are calibrated to mid-volume workloads; verify in your next billing cycle.
The pattern: tagging metrics with user_id, email, session_id, request_id, or any value with high distinct-count. Datadog bills custom metrics at $0.05 per 100 timeseries per month, and every distinct tag value creates a new timeseries.
- statsd.increment('login.attempt', tags=[f'user_id:{user.id}']) + statsd.increment('login.attempt', tags=['plan:' + user.plan]) # bucketed
A SaaS with 50K active users emitting one metric per login = 50K timeseries × $0.05 / 100 = $25/mo per metric. Multiply by 30 metrics × 12 months = $9,000/yr from one carelessly-tagged dimension. The static scan flags every user_id, email, uuid, and known-high-cardinality literal in the tags array.
The pattern: ingesting health-check logs, debug logs, or framework noise without exclusion filters. Datadog log ingestion is $0.10/GB; retention adds $1.70/GB for 15-day, $2.50/GB for 30-day.
The static scan checks datadog.yaml + integration configs for missing exclusion patterns on routes known to be noisy (/health, /ready, /_status, /favicon.ico, /_next/) and on log levels (DEBUG, TRACE) emitted from production. A typical web service generates 30-60% of its log volume from health checks alone.
+ # datadog.yaml — exclude health-check noise + logs_config: + processing_rules: + - type: exclude_at_match + name: drop_health_checks + pattern: 'GET /(health|ready|_status|_next/static)'
The pattern: production tracing configured with DD_TRACE_SAMPLE_RATE=1.0 (100%). Indexed spans are billed per million; for any service over 10 RPS sustained, this becomes the single largest bill line item.
- DD_TRACE_SAMPLE_RATE=1.0 # 100% in prod = $$$ + DD_TRACE_SAMPLE_RATE=0.1 # 10% baseline + # Bring 100% to specific high-value endpoints via sampling rules
The scan flags 1.0, 1, or 100 values across .env, datadog.yaml, Helm values, ECS task definitions, and Terraform datadog_synthetics_test. Sample-rate dropping from 100% to 10% on a service emitting 1B spans/mo: that's a $10K+/mo line item difference.
The pattern: Datadog Synthetics tests running every 60s when 5-15 min suffices. Each API test run is $0.0017 ($5/1000 runs); browser tests are $0.012 each. A test that runs every minute (1440/day) costs $2.45/mo each; same test at 5 min cadence costs $0.49/mo. Multiply by 50-200 tests per org.
# terraform/datadog.tf - tick_every = 60 # every minute = expensive + tick_every = 300 # every 5 minutes (most monitors don't need sub-minute)
Static scan looks for tick_every in Terraform datadog_synthetics_test and flags values below 300s for any test type other than uptime-critical paths. Most teams have 50-200 synthetics with default 60s cadence.
The pattern: log indexes that retain old logs no one queries. Datadog charges by indexed-volume-by-retention; a 30-day index over 100GB/month = $3,500/mo, vs a 15-day index over the same volume = $2,550/mo. Static scan inspects datadog.yaml + Datadog Terraform datadog_logs_index for indexes with retention >15 days, then reports which log sources land in each.
The pattern: kubernetes_state.core + kubernetes_apiserver_metrics + kubelet + kube_dns all enabled on every node by default. Datadog bills custom metrics from autodiscovery — a 50-node K8s cluster default-enabled can emit 80K+ custom metrics/hour purely from infrastructure churn. Static scan reads cluster_check_runner_config.yaml + Helm values and flags any infra-integration with no business-metric whitelist.
The pattern: Datadog Real User Monitoring (RUM) with sampleRate: 100 (sample every session). RUM is $1.50/1000 sessions, $5/1000 replays. A consumer app with 100K MAU and default sampling: 100K × $0.0015 = $150/mo on sessions, but session-replay at 100% = $500/mo on top. Most teams need 10-25% session sample + 5% session replay for diagnostic-quality data.
- datadogRum.init({ sampleRate: 100, sessionReplaySampleRate: 100 }); + datadogRum.init({ sampleRate: 25, sessionReplaySampleRate: 5 });
Every audit product I ship is zero-LLM-in-the-loop. The reasons:
Honesty section. The audit scans STATIC code + config. It does NOT:
If you have runtime metric anomalies but clean static config, that's a different scan. The Datadog Cost Audit's lane is "your static configuration + instrumentation code is the leak."
Drop a GitHub repo URL. Within 1 hour, get an HTML report listing every cost-pattern finding ranked by $/mo descending, with before/after fix snippets ready for a PR.
Buy Datadog Cost Audit — $149 → Or view the synthetic sample report first ($8,400/mo, 7 findings)