List-price math only. Prompt caching and the Batch API can cut real bills 30-50%. The deep audit models your actual usage with discounts included.
| Model | Input ($/MTok) | Output ($/MTok) | Best for |
|---|---|---|---|
| GPT-4o | $2.50 | $10.00 | General agents, code, multi-step reasoning |
| GPT-4o-mini | $0.15 | $0.60 | Classification, extraction, high-volume back-end |
| GPT-4o-realtime | $5.00 | $20.00 | Bidirectional voice (text rates shown; audio is ~8x) |
One-page rate reference + one-page "5 ways your OpenAI bill goes 3x over list price" — most common waste patterns we see in real audits. PDF sent to your inbox.
For each model: monthly_cost = (input_tokens × input_rate / 1,000,000) + (output_tokens × output_rate / 1,000,000)
Example: 10 million input tokens + 2 million output tokens on GPT-4o = (10 × $2.50) + (2 × $10) = $45/month at list price. Same workload on GPT-4o-mini = (10 × $0.15) + (2 × $0.60) = $2.70/month. The 16x cost gap is real — GPT-4o-mini handles 80%+ of typical agent workloads at a fraction of the price.
GPT-4o-realtime is a different shape entirely: text rates are 2x GPT-4o, but the real cost driver is the audio tokens, which run roughly 8x more expensive than text. If you're not doing voice, skip it.
Published rates as of 2026-05 from OpenAI's pricing page. GPT-4o: $2.50 input / $10 output. GPT-4o-mini: $0.15 / $0.60. GPT-4o-realtime: $5 / $20 for text (audio tokens are ~8x). Always verify at openai.com/pricing before signing a contract.
Open platform.openai.com/usage and export the last 30 days as CSV. The export breaks down input vs output tokens per model. Heuristic without a billing relationship: 1,000 tokens is about 750 English words. A typical chat turn is 200-500 input tokens and 100-300 output tokens.
No. List-price math only. OpenAI prompt caching cuts cached input tokens by 50%. The Batch API gives 50% off both input and output for async workloads. Production workloads with stable system prompts routinely run 30-50% under list price.
GPT-4o-mini for classification, extraction, routing, and high-volume back-end — 16x cheaper input than GPT-4o and matches on most narrow tasks. GPT-4o for general agents, code, and multi-step reasoning. GPT-4o-realtime only when you specifically need bidirectional voice.
No. Token volumes and rate math run locally in your browser. The page fires an anonymous pageview beacon and CTA-click events so we can measure whether the calculator is useful — no inputs, no email (unless you submit one to the cheat-sheet form), no raw IP stored.