The Batch API gives you 50% off in exchange for a 24-hour SLA. For nightly classification, doc summarization, and eval runs, that SLA literally costs you nothing. Here's the math, the migration code, and the half-dozen gotchas worth knowing.
Here's the thing about Batch: it's not a different model, not a different quality tier, not a "stripped down" anything. It's the exact same gpt-4o or gpt-4o-mini you're calling synchronously, with one difference — OpenAI processes the request whenever it has spare capacity within 24 hours, and you get the result back as a downloadable file. In exchange for that asynchrony, every input and output token bills at 50% of the synchronous rate.
For some workloads that tradeoff is unacceptable (user-facing chat). For others it's free money you're leaving on the table every single night.
Current published OpenAI prices (mid-2026):
gpt-4o sync: $2.50/M input, $10.00/M outputgpt-4o batch: $1.25/M input, $5.00/M outputgpt-4o-mini sync: $0.15/M input, $0.60/M outputgpt-4o-mini batch: $0.075/M input, $0.30/M outputThe discount is a flat 50% across SKUs, both input and output. No tiers, no minimums, no commitments.
Worked example. A nightly job classifies 40,000 customer support tickets per day. Each ticket: ~800 input tokens (the message plus a 600-token system prompt) and ~30 output tokens (a JSON category label).
Per-day spend on gpt-4o-mini:
Savings on this one cron: ~$83/mo. Modest individually. But teams typically have 4-8 of these workloads (nightly summarization, weekly eval suite, monthly retrospective digests, content moderation backlog, doc embedding refresh), and the aggregate frequently lands in the $500-$2,000/mo range. We've seen one audit where Batch alone accounted for $3,400/mo in recoverable spend.
The decision rule is uncomplicated: if the result doesn't have to land in less than 24 hours, use Batch. Five canonical use cases:
text-embedding-3-small to text-embedding-3-large. Sync would take ~12 hours of saturated rate-limit pushing. Batch wraps it cleanly.Conversely, the workloads that are NEVER Batch fits: real-time chat, autocomplete, agents in a tool-use loop where the next step depends on this step's output, anything customer-facing with a synchronous response.
A typical sync nightly job looks like this:
import openai
def classify_one(ticket):
resp = openai.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": CLASSIFY_SYSTEM_PROMPT},
{"role": "user", "content": ticket.body},
],
max_tokens=64,
temperature=0,
)
return resp.choices[0].message.content
def run():
for ticket in fetch_yesterdays_tickets():
label = classify_one(ticket)
save_label(ticket.id, label)
The Batch equivalent splits into two phases — submit today, fetch tomorrow:
import openai, json, time
from pathlib import Path
def build_batch_jsonl(tickets, out_path):
with open(out_path, "w") as f:
for t in tickets:
f.write(json.dumps({
"custom_id": f"ticket-{t.id}",
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": "gpt-4o-mini",
"messages": [
{"role": "system", "content": CLASSIFY_SYSTEM_PROMPT},
{"role": "user", "content": t.body},
],
"max_tokens": 64,
"temperature": 0,
},
}) + "\n")
def submit_today():
tickets = fetch_yesterdays_tickets()
jsonl_path = Path("/tmp/batch_in.jsonl")
build_batch_jsonl(tickets, jsonl_path)
upload = openai.files.create(file=open(jsonl_path, "rb"), purpose="batch")
batch = openai.batches.create(
input_file_id=upload.id,
endpoint="/v1/chat/completions",
completion_window="24h",
)
save_pending_batch(batch.id) # tomorrow's run reads this
def fetch_prior():
batch_id = load_pending_batch()
if not batch_id: return
batch = openai.batches.retrieve(batch_id)
if batch.status != "completed":
log.warning(f"Batch {batch_id} not done: {batch.status}")
return # try again tomorrow, OpenAI keeps it 7 days
output = openai.files.content(batch.output_file_id).text
for line in output.strip().split("\n"):
result = json.loads(line)
ticket_id = result["custom_id"].removeprefix("ticket-")
label = result["response"]["body"]["choices"][0]["message"]["content"]
save_label(ticket_id, label)
clear_pending_batch()
def run():
fetch_prior() # yesterday's batch
submit_today() # today's batch
The cron runs once a day. Each invocation completes yesterday's batch and submits today's. Steady state: results land on a ~24-hour delay. For a daily classification job, that delay is invisible — you're still seeing one day of results per day.
response and error shapes per line. A common bug is assuming every line has response.body.choices[0].One of the 9 deterministic patterns the analyzer applies: "file path matches /jobs/, /cron/, /nightly/, /batch/, /pipelines/ AND contains chat.completions.create AND doesn't contain batches.create". We flag every such file with the exact file:line and a paste-into-PR migration diff. 14-day money-back guarantee if total surfaced savings < $79.
Or try the free Mini-Triage first — paste 3 file URLs, get a 1-page diagnosis.
Anthropic also ships a Message Batches API at the same 50% discount and 24-hour SLA. The wire format is different (JSONL with Anthropic's messages.create shape) but the cost economics are identical. If your codebase mixes OpenAI and Anthropic, audit both stacks — the same nightly-job-not-using-batch pattern applies symmetrically.
Google Vertex/Gemini also has a batch prediction mode with similar discount structure. The 24h-SLA-for-50%-off pattern is now an industry-standard offering rather than an OpenAI-only thing.
Yes, modulo normal sampling variance. Same model, same weights, same tokenizer. At temperature=0 you'll typically get bit-identical outputs.
Yes. The body in each JSONL line accepts the full set of parameters — response_format, tools, tool_choice, etc. — that you'd pass to the sync endpoint.
Separately. Batch has its own per-tier limits (enqueued tokens, requests per batch). Your sync TPM/RPM is unaffected by batches in flight. This is partly why Batch exists — it lets OpenAI smooth out load.
If your eval suite tolerates 1-2 hours but not 24, run them sync with high concurrency. Batch is for jobs that are genuinely OK with overnight.
The Anthropic bill-doubled post covers cache_control as the dominant leak on Claude codebases. This post covers Batch as the dominant leak on OpenAI codebases that have nightly cron jobs. The 5-patterns post covers both plus 3 others in one shorter walkthrough of a real public-repo audit.