Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.zerogpu.ai/llms.txt

Use this file to discover all available pages before exploring further.

The Batch API processes hundreds, thousands, or up to 50,000 inference requests at a discounted rate and within a 24-hour window. It’s the right choice when you don’t need a real-time response and want to avoid per-request rate limits.

Quickstart

First batch in under 10 minutes: auth, upload, create, poll, download.

Upload file (playground)

POST /v1/files, attach JSONL with purpose=batch.

Create batch (playground)

POST /v1/batches after you have an input file id.

Retrieve batch (playground)

Poll GET /v1/batches/{batch_id} for status and output file ids.

JSONL format

Input line schema, output schema, error schema, validation rules.

Files API reference

All five /v1/files endpoints (prose).

Batches API reference

Create, list, retrieve, cancel (prose).

Examples

End-to-end walkthroughs in curl and the Python openai SDK.

Quick facts

Base URL (production)https://api.zerogpu.ai
Auth headersx-api-key, x-project-id
Completion window24 hours (fixed)
Supported batch endpoint/v1/chat/completions (only)
Max requests per batch50,000
Max input file size200 MB total, 1 MB per line
Max upload size100 MB
File retention30 days

When to use the Batch API

You need…Use
A single immediate responseThe synchronous endpoint directly (e.g. POST /v1/chat/completions)
Hundreds-to-thousands of completions, can wait minutes-to-hoursThe Batch API
To avoid per-second rate limits during a backfillThe Batch API
Streaming responsesThe synchronous endpoint, streaming is not supported in batch mode

Go deeper

Files API reference

Upload, list, retrieve, download, delete, every endpoint, every parameter.

Batches API reference

Create, list, retrieve, including the full Batch object schema and lifecycle.

Errors reference

Every HTTP status, every validation message, every code that can appear in the error JSONL.