Overview

The Batch API processes hundreds, thousands, or up to 50,000 inference requests at a discounted rate within a 24-hour window. It’s the right choice when you don’t need a real-time response and want to avoid per-request rate limits.

Quickstart

Run your first batch end to end in curl and Python: upload, create, poll, download.

Objects & lifecycle

The File and Batch objects, the status lifecycle, and every endpoint.

JSONL format

Line schemas for input, output, and error files, plus validation rules.

Supported endpoints

The one endpoint a batch line can target, with its request and response body.

Errors

Every status and error-file code, with recovery guidance.

Quick facts


Base URL (production)	`https://api.zerogpu.ai`
Auth headers	`x-api-key` (required), `x-project-id` (optional)
Completion window	24 hours (fixed)
Supported batch endpoint	`/v1/chat/completions` (only)
Max requests per batch	50,000
Max input file size	200 MB total, 1 MB per line
Max upload size	100 MB
File retention	30 days

When to use the Batch API

You need…	Use
A single immediate response	The synchronous endpoint directly (e.g. `POST /v1/chat/completions`)
Hundreds-to-thousands of completions, can wait minutes-to-hours	The Batch API
To avoid per-second rate limits during a backfill	The Batch API
Streaming responses	The synchronous endpoint, streaming is not supported in batch mode

Every endpoint also has an interactive playground under API Reference → Batch API.

Summarization Quickstart

⌘I

Get Started

Models

Guides

Platform

Quickstart

Objects & lifecycle

JSONL format

Supported endpoints

Errors

Quick facts

When to use the Batch API

Quickstart

Objects & lifecycle

JSONL format

Supported endpoints

Errors

​Quick facts

​When to use the Batch API

Quick facts

When to use the Batch API