Datadog Cost Audit · by Milo Antaeus

Your Datadog Cost Audit Report

Static-analysis Datadog billing-leak audit · https://github.com/DataDog/terraform-provider-datadog · Generated 2026-05-17 01:12 UTC

Client-library files: 0 Datadog YAML configs: 0 Terraform: 201 Patterns checked: 9 Confidence: deterministic (no LLM-in-the-loop)

Executive summary

1 ranked Datadog cost-leak findings across 201 relevant file(s) (0 client-library source files, 0 Datadog YAML configs, 201 Terraform Datadog-provider file(s)). Implementing the top 1 could save approximately $100/month — $1,200/year.

RECURRING Datadog billing savings verifiable in your Datadog "Plan & Usage" page next billing cycle. Filter: https://app.datadoghq.com/billing/usage — look for custom_metrics_count (Patterns 1-2), indexed_logs_gb (Patterns 3-4), apm_indexed_spans (Pattern 5), and synthetics_browser_test_runs / synthetics_api_test_runs (Patterns 6-7). Each finding's dollar claim maps to a specific Plan & Usage SKU line item. All savings estimates use conservative confidence ratings (0.55-0.90).

#	Opportunity	Severity	$/mo saved
1	datadog_monitor `watchdog_monitor` missing: name, message (governance smell)	LOW	$100

TOTAL ESTIMATED MONTHLY SAVINGS: $100

Opportunity #1 — datadog_monitor `watchdog_monitor` missing: name, message (governance smell) $100/mo

Confidence: 55% · Rule: dd_terraform_metric_no_metadata

LOW

Where: examples/guides/watchdog_monitor.tf:1

What we found: `datadog_monitor.watchdog_monitor` is declared without name, message. Datadog monitors without a clear `name` and `message` create downstream cost: alerts fire without context, on-call engineers can't tell what's broken, and the monitor frequently gets either disabled (losing coverage) or duplicated with a different message (multiplying notification cost). The governance smell is the same one Pattern 9 catches for metrics — undocumented observability assets correlate strongly with custom-metric / log-volume runaway. Per https://docs.datadoghq.com/metrics/guide/custom_metrics_governance/, every observability asset should carry enough context for someone unfamiliar with the system to act on it.

Before (examples/guides/watchdog_monitor.tf:1)

resource "datadog_monitor" "watchdog_monitor" {
  name    = "Watchdog detected an anomaly: {{event.title}}"
  type    = "event-v2 alert"
  message = "Watchdog monitor created from Terraform"

  query = "events(\"source:watchdog story_category:apm env:test_env\").rollup(\"count\").by(\"story_key,service,resource_name\").last(\"30m\") > 0"

After

resource "datadog_monitor" "watchdog_monitor" {
  type    = "metric alert"
  name    = "Watchdog Monitor alert"
  message = <<-EOT
    Service is degraded. Likely causes: <list typical causes>.
    Runbook: <link>.
    Notify: @slack-oncall
  EOT
  # ... rest of monitor ...
}

How Datadog billing works (and how to verify these savings)

Datadog charges by SKU, each with its own per-unit price and quota structure. The most common cost drivers (and what this audit targets):

Custom metrics: $0.05/series/month after the per-host quota (100 series/host on Pro). This is the SKU most exploded by Patterns 1 + 2. A single metric with a high-cardinality tag like user_id across 10K users = 10,000 billable series = $500/month for one metric. Distributions auto-generate 5 percentile sub-metrics (p50/p75/p90/p95/p99); high-arity tags fan this out multiplicatively.
Indexed logs: $1.27 to $3.75 per million events, depending on retention (15-day to 1-year). This is what Patterns 3 + 4 target. A typical app emits 50-500GB/day of logs; exclusion filters can drop 90%+ of that before it hits the index.
Ingested logs (scan): $0.10/GB scanned. Less expensive than indexed but still material at high volume. Pattern 4 (debug log level) inflates this 3-10x.
APM indexed spans: $1.27-$1.70 per million spans. Pattern 5 (DD_TRACE_SAMPLE_RATE=1.0) sets this on fire for high-RPS services — 1000 RPS at 100% sampling = ~2.6B spans/month = $3-4K/month per service.
Synthetics: $7.20 per 10,000 browser test runs, $5.00 per 10,000 API test runs. Patterns 6 + 7 target sub-5-minute cadences and 5+ location fan-out.
Hosts: $15-$31/host/mo depending on tier. Not directly targeted by this audit but worth knowing — many of the patterns above were originally cost-optimized to fit within the host-quota model.

To verify any finding's savings claim, open https://app.datadoghq.com/billing/usage, filter by date range (a 30-day window before-and-after each fix is ideal), and watch the relevant SKU line item drop. Custom-metrics fixes show up immediately on next-day usage graphs; log/APM fixes show up over a 24-72 hour window as agent restarts propagate.

30-day re-audit voucher

Included with your $149 audit: a voucher for a free re-audit 30 days after delivery. Implement the recommended Datadog config + code changes, then re-submit the same repo URL via reply email — we re-run the analysis and confirm the cost-leak patterns are resolved. If we still flag any of the CRITICAL findings from this report, refund issued automatically.

Why this matters: Datadog savings only materialize once the code/config changes land in production AND the agent restarts pick them up. The re-audit voucher creates an accountability loop — we can't claim "issue resolved" unless the v1 ruleset agrees on re-scan. Same deterministic engine, same file paths, same line numbers. No moving goalposts.

Verification path for customers: after applying changes, watch the relevant SKUs at https://app.datadoghq.com/billing/usage over a 7-30 day window. Custom-metric counts drop within hours of agent restart; log-exclusion savings appear within 24-72 hours as the new rules propagate; APM-sampling savings show on the next ingestion summary (usually 4-6 hours). We can supply the exact Plan & Usage filter for each finding on request.