Supported Endpoints

Every line in a batch input file must target /v1/chat/completions in the url field, and the endpoint field on the create call must match. This matches OpenAI’s published batchable surface for this API version.

`url`	Use case	Sync equivalent
`/v1/chat/completions`	OpenAI-style chat completions	`POST /v1/chat/completions`

The request body inside each JSONL line’s body field is the same body you would send synchronously. The response in the output JSONL’s response.body is the same response the synchronous endpoint would return.

Other ZeroGPU endpoints are sync-onlyInternal routes like /summary, /v1/responses, /usecase/classify/iab, and /gliner remain reachable as synchronous calls but are not batchable. Submitting a JSONL line with one of these url values is rejected at create time with “is not an allowed batch endpoint”.

`/v1/chat/completions`

Standard OpenAI-style chat completions. Suitable for any model that supports chat-completion semantics.

Request body (inside JSONL `body`)

{
  "model": "<model-id>",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user",   "content": "What is the capital of France?" }
  ]
}

Field	Required	Description
`model`	Yes	A model ID enabled for your project.
`messages`	Yes	Array of `{role, content}`. `role` is one of `user`, `assistant`, `system`, `tool`, `developer`.
`stream`	-	Must not be `true`. Rejected at batch creation.

Per-model token limits apply (configured in model metadata). If input tokens exceed the model’s max_tokens setting, behavior depends on the model’s error_on_max_token flag: either the line fails with code: "context_length_exceeded" (HTTP 400), or the input is silently truncated. Verify with your model’s configuration.

Response body (inside output JSONL `response.body`)

{
  "id":      "chatcmpl-...",
  "object":  "chat.completion",
  "created": 1736295000,
  "model":   "<model-id>",
  "choices": [
    {
      "index": 0,
      "message": {
        "role":        "assistant",
        "content":     "Paris.",
        "annotations": []
      },
      "finish_reason": "stop",
      "logprobs":      null
    }
  ],
  "usage": {
    "prompt_tokens":     12,
    "completion_tokens": 2,
    "total_tokens":      14,
    "prompt_tokens_details":     { "cached_tokens": 0, "audio_tokens": 0 },
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "audio_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    }
  },
  "service_tier":       "default",
  "system_fingerprint": null
}

system_fingerprint is always null today. ZeroGPU does not yet expose backend-determinism fingerprints. The field is present for SDK compatibility (OpenAI’s ChatCompletion type declares it as optional but real OpenAI populates it on every response). For classification models invoked via /v1/chat/completions, choices[0].message.content is a JSON-stringified classification response rather than free-form text. Parse it as JSON to extract the structured result.

Next steps

JSONL format →

How to wrap each request body in a valid input JSONL line.

Batches API reference →

Create, list, retrieve, cancel; status lifecycle and limits.

Examples →

End-to-end walkthrough in curl and Python.

Documentation Index

​/v1/chat/completions

​Request body (inside JSONL body)

​Response body (inside output JSONL response.body)

​Next steps

JSONL format →

Batches API reference →

Examples →

`/v1/chat/completions`

Request body (inside JSONL `body`)

Response body (inside output JSONL `response.body`)

Next steps