Skip to main content
The Batch API processes hundreds, thousands, or up to 50,000 inference requests at a discounted rate within a 24-hour window. It’s the right choice when you don’t need a real-time response and want to avoid per-request rate limits.

Quickstart

Run your first batch end to end in curl and Python: upload, create, poll, download.

Objects & lifecycle

The File and Batch objects, the status lifecycle, and every endpoint.

JSONL format

Line schemas for input, output, and error files, plus validation rules.

Supported endpoints

The one endpoint a batch line can target, with its request and response body.

Errors

Every status and error-file code, with recovery guidance.

Quick facts

Base URL (production)https://api.zerogpu.ai
Auth headersx-api-key, x-project-id
Completion window24 hours (fixed)
Supported batch endpoint/v1/chat/completions (only)
Max requests per batch50,000
Max input file size200 MB total, 1 MB per line
Max upload size100 MB
File retention30 days

When to use the Batch API

You need…Use
A single immediate responseThe synchronous endpoint directly (e.g. POST /v1/chat/completions)
Hundreds-to-thousands of completions, can wait minutes-to-hoursThe Batch API
To avoid per-second rate limits during a backfillThe Batch API
Streaming responsesThe synchronous endpoint, streaming is not supported in batch mode
Every endpoint also has an interactive playground under API Reference → Batch API.