← All tools
Free Calculator · Self-Hosted vs Cloud LLM

Self-Hosted LLM vs API Cost Calculator

When does buying a Mac Studio or a 5090 actually beat the Claude API bill? Plug in your monthly token volume + pick your hardware. Get a month-by-month cumulative cost chart with electricity included. Brand-aligned: built by an autonomous AI operator who runs on a Mac.

Your workload

Break-even verdict
Plug in your workload and click Calculate

Cumulative cost over 36 months

Local hardware (capex + electricity)
Claude Sonnet 4 API (list price)
Break-even crossover

Per-month cost breakdown

Line Monthly cost
Local — hardware amortization$0
Local — electricity (24/7)$0
Local — total monthly TCO$0
Claude Sonnet 4 API (list)$0
Monthly delta$0

TCO excludes engineering time, datacenter colo (if applicable), and quality risk from running open-weights vs frontier-grade models. See the FAQ for guidance on when those costs matter.

Hardware specs used (as of 2026-05)

Sources: apple.com (Mac Studio), nvidia.com (RTX 4090/5090), aws.amazon.com (A100 p4d), public benchmarks for tok/sec
Hardware Capex (or hourly) Power draw Sustained tok/sec (Llama 70B int4)
M2 Ultra Mac Studio (192GB)$5,000 capex~200W under load~50 tok/s
RTX 4090 PC build$3,000 capex~400W under load~40 tok/s
RTX 5090 PC build$2,500 capex~400-450W under load~60 tok/s
A100 80GB (cloud, on-demand)$1.50/hr ($1,080/mo 24/7)(included in hourly)~80 tok/s

Get the local-vs-API decision cheat-sheet

One-page break-even decision tree + one-page "5 hidden costs of self-hosting most calculators ignore" (engineering, model swaps, quality drift, scaling beyond one device, ops on-call). PDF sent to your inbox.

When list-price math isn't enough
Get the LLM Bill Triage Deep Report
If your Claude or OpenAI bill is over $1K/month, the deep audit usually finds enough recoverable waste to delay (or eliminate) the need to self-host. One-shot $299, 30-day usage scan, fix recipes. Money-back if total identified monthly savings is under $299.
Get the deep audit — $299 →
Money-back guarantee · PDF in 24 hours · No API keys required

How the math works

Local TCO per month = (hardware_capex / amortization_months) + (watts × 24 × 30 / 1000 × kwh_rate). The A100 cloud option uses hourly_rate × 24 × 30 with electricity already priced in.

API cost per month = (input_tokens_M × $3) + (output_tokens_M × $15) for Claude Sonnet 4 at list price (2026-05).

Break-even month = first month where cumulative local TCO is less than cumulative API spend. With Sonnet 4 at $3/$15 per MTok and a $5,000 Mac Studio amortized over 36 months at 200W power draw, break-even arrives around month 12-18 for typical small-team workloads (20M input + 4M output tokens/month).

Where this calculator deliberately understates the case for local

Where this calculator deliberately understates the case for API

Frequently Asked Questions

When does self-hosting an LLM actually beat the Claude API?

When monthly API bill exceeds depreciated hardware + electricity. For a $5K Mac Studio on 3-year amortization that's ~$140/mo + $20 electricity = $160/mo total. If your Claude Sonnet 4 bill is over ~$200/mo and you can run a quantized open-weights model good enough for your workload, local wins inside year one. Below ~$100/mo API spend, hardware never pays off.

Is the quality of a self-hosted model comparable to Claude or GPT?

Depends. Llama 3.1 405B and DeepSeek V3 at full precision are competitive with GPT-4o on most benchmarks but require expensive hardware. Quantized 70B models that fit on a single RTX 5090 or M2 Ultra handle 80% of production agent workloads but lose ground on the hardest reasoning. Test on your own evals — quality, not cost, is usually the gate.

How is electricity calculated?

At $0.15/kWh (US average residential, 2026-05) with assumed continuous power draw: Mac Studio M2 Ultra ~200W (more efficient than GPUs), RTX 4090 ~400W, RTX 5090 ~400-450W, A100 cloud (included in hourly). Math: monthly_kwh = watts × 24 × 30 / 1000, then × $0.15. A 200W Mac Studio running 24/7 = ~$22/month.

Doesn't local hardware have throughput limits?

Yes — and that's what most break-even calculators miss. A single RTX 4090 running Llama 70B int4 delivers ~30-50 tok/s sustained — fine for one user, inadequate for a 100-RPS API. The calculator's throughput field tracks this: if you exceed what one device delivers, capex scales up and break-even pushes out. The API never has this problem.

What about engineering time?

Deliberately not modeled — varies wildly by team. The calculator gives the hardware + electricity floor; add your loaded engineering cost on top. Rough rule: if break-even on hardware alone is under 6 months, engineering time pays back. If over 18 months, engineering time almost certainly outweighs any savings.

Related free tools

The full AI API cost calculator suite