Every line in a batch input file must targetDocumentation Index
Fetch the complete documentation index at: https://docs.zerogpu.ai/llms.txt
Use this file to discover all available pages before exploring further.
/v1/chat/completions in the
url field, and the endpoint field on the create call must match. This
matches OpenAI’s published batchable surface for this API version.
url | Use case | Sync equivalent |
|---|---|---|
/v1/chat/completions | OpenAI-style chat completions | POST /v1/chat/completions |
body field is the same body
you would send synchronously. The response in the output JSONL’s
response.body is the same response the synchronous endpoint would
return.
Other ZeroGPU endpoints are sync-onlyInternal routes like
/summary, /v1/responses,
/usecase/classify/iab, and /gliner remain
reachable as synchronous calls but are not batchable.
Submitting a JSONL line with one of these url values is
rejected at create time with
“is not an allowed batch endpoint”./v1/chat/completions
Standard OpenAI-style chat completions. Suitable for any model that supports
chat-completion semantics.
Request body (inside JSONL body)
| Field | Required | Description |
|---|---|---|
model | Yes | A model ID enabled for your project. |
messages | Yes | Array of {role, content}. role is one of user, assistant, system, tool, developer. |
stream | - | Must not be true. Rejected at batch creation. |
max_tokens setting, behavior depends on the model’s
error_on_max_token flag: either the line fails with
code: "context_length_exceeded" (HTTP 400), or the input is silently
truncated. Verify with your model’s configuration.
Response body (inside output JSONL response.body)
system_fingerprint is always null today. ZeroGPU does not yet expose
backend-determinism fingerprints. The field is present for SDK
compatibility (OpenAI’s ChatCompletion type declares it as optional but
real OpenAI populates it on every response).
For classification models invoked via /v1/chat/completions,
choices[0].message.content is a JSON-stringified classification response
rather than free-form text. Parse it as JSON to extract the structured
result.
Next steps
JSONL format →
How to wrap each request body in a valid input JSONL line.
Batches API reference →
Create, list, retrieve, cancel; status lifecycle and limits.
Examples →
End-to-end walkthrough in curl and Python.

