This page walks through a complete batch from start to finish in both curl
and the Python openai SDK. By the end you’ll have submitted 3 chat
completions through the Batch API and read the results.
Prerequisites
| Credential | Where to find it |
|---|
API key (x-api-key) | ZeroGPU dashboard → API Keys |
Project ID (x-project-id) | ZeroGPU dashboard → Projects |
Both headers are required on every request. Missing either returns 401.
Keep your API key out of source controlStore it in environment variables, secret managers, or your CI’s secret
store, never commit it.
export ZGPU_API_KEY="your-api-key"
export ZGPU_PROJECT_ID="your-project-uuid"
Base URLs
| Environment | URL |
|---|
| Production | https://api.zerogpu.ai |
| Staging | https://staging.api.zerogpu.ai |
| Development | https://dev.api.zerogpu.ai |
The Batch and Files endpoints live under these hostnames:
POST /v1/files, GET /v1/files, GET /v1/files/{id}, GET /v1/files/{id}/content, DELETE /v1/files/{id}
POST /v1/batches, GET /v1/batches, GET /v1/batches/{batch_id}
Every batch is driven by a JSONL file where each line is one inference
request:
{"custom_id": "req-1", "method": "POST", "url": "/v1/chat/completions", "body": { ... }}
| Field | Required | Description |
|---|
custom_id | Yes | Your identifier for the request. Must be unique within the batch. Echoed back in the output so you can match results to inputs. |
method | Yes | Must be "POST". |
url | Yes | Must be /v1/chat/completions, the only supported batch endpoint. All lines in a batch must share the same url. |
body | Yes | The JSON body you would send to that endpoint synchronously. "stream": true is rejected, batches are non-streaming. |
Full schema and validation rules: JSONL format.
cat > input.jsonl <<'EOF'
{"custom_id":"q-1","method":"POST","url":"/v1/chat/completions","body":{"model":"<model-id>","messages":[{"role":"user","content":"What is the capital of France?"}]}}
{"custom_id":"q-2","method":"POST","url":"/v1/chat/completions","body":{"model":"<model-id>","messages":[{"role":"user","content":"What is the capital of Germany?"}]}}
{"custom_id":"q-3","method":"POST","url":"/v1/chat/completions","body":{"model":"<model-id>","messages":[{"role":"user","content":"What is the capital of Italy?"}]}}
EOF
2. Upload the file
Send the JSONL to POST /v1/files with purpose=batch. The response
contains the file_id you’ll reference when creating the batch.
curl -X POST https://api.zerogpu.ai/v1/files \
-H "x-api-key: $ZGPU_API_KEY" \
-H "x-project-id: $ZGPU_PROJECT_ID" \
-F purpose=batch \
-F [email protected]
Response:
{
"id": "file-abc123...",
"object": "file",
"bytes": 612,
"created_at": 1736290000,
"filename": "input.jsonl",
"purpose": "batch",
"status": "processed",
"expires_at": 1738882000
}
3. Create the batch
Submit the batch with the file ID, the target endpoint, and a 24-hour
completion window. The response returns immediately with
status: "in_progress", actual processing is asynchronous.
curl -X POST https://api.zerogpu.ai/v1/batches \
-H "x-api-key: $ZGPU_API_KEY" \
-H "x-project-id: $ZGPU_PROJECT_ID" \
-H "content-type: application/json" \
-d '{
"input_file_id": "file-abc123...",
"endpoint": "/v1/chat/completions",
"completion_window": "24h"
}'
Response:
{
"id": "batch_01HZX...",
"object": "batch",
"endpoint": "/v1/chat/completions",
"status": "in_progress",
"input_file_id": "file-abc123...",
"output_file_id": null,
"error_file_id": null,
"created_at": 1736290000,
"expires_at": 1736376400,
"request_counts": { "total": 3, "completed": 0, "failed": 0 }
}
Validation runs at create timeThe server streams the entire input JSONL and validates every line before
responding. If anything is wrong, duplicate custom_id, line
over 1 MB, stream: true, mismatched url, you’ll
get a 400 with the offending line. Once the response
returns, the batch is durably committed.
4. Poll until complete
Poll GET /v1/batches/{batch_id} until status is completed, expired,
or failed. A 30-second interval is a reasonable default.
while true; do
RESP=$(curl -s "https://api.zerogpu.ai/v1/batches/$BATCH_ID" \
-H "x-api-key: $ZGPU_API_KEY" \
-H "x-project-id: $ZGPU_PROJECT_ID")
STATUS=$(echo "$RESP" | jq -r '.status')
echo "$RESP" | jq -c '{status, request_counts}'
case "$STATUS" in completed|failed|expired) break;; esac
sleep 30
done
When status is completed, the response contains output_file_id (and,
if any line failed, error_file_id).
5. Download the results
Stream the output and error files via GET /v1/files/{file_id}/content.
curl -s "https://api.zerogpu.ai/v1/files/$OUTPUT_FILE_ID/content" \
-H "x-api-key: $ZGPU_API_KEY" \
-H "x-project-id: $ZGPU_PROJECT_ID" \
-o output.jsonl
jq -r '"\(.custom_id): \(.response.body.choices[0].message.content)"' output.jsonl
Output line shape (one per successful line, order not preserved, match
by custom_id):
{"id": "batch_req_a", "custom_id": "q-1", "response": {"status_code": 200, "request_id": "req_xyz", "body": { ... }}}
Error line shape (one per failed line):
{"id": "batch_req_b", "custom_id": "q-2", "response": null, "error": {"code": "invalid_request_error", "message": "[legacy:http_400] ...", "param": null}}
Full schemas: JSONL format.
Scale it up
The same five steps handle a real workload. This script sends 1,000 prompts
and collects the answers into a CSV, generating the JSONL from your own data,
polling once a minute, and matching results back by custom_id:
import csv, json, os, time
from openai import OpenAI
client = OpenAI(
api_key="ignored-by-zerogpu",
base_url="https://api.zerogpu.ai/v1",
default_headers={
"x-api-key": os.environ["ZGPU_API_KEY"],
"x-project-id": os.environ["ZGPU_PROJECT_ID"],
},
)
# 1. Write JSONL from your data (anything yielding (doc_id, prompt) tuples)
prompts = load_my_dataset() # 1000 records
with open("chat.jsonl", "w") as f:
for doc_id, prompt in prompts:
f.write(json.dumps({
"custom_id": doc_id,
"method": "POST",
"url": "/v1/chat/completions",
"body": {"model": "<model-id>", "messages": [{"role": "user", "content": prompt}]},
}) + "\n")
# 2-3. Upload + create
uploaded = client.files.create(file=open("chat.jsonl", "rb"), purpose="batch")
batch = client.batches.create(
input_file_id=uploaded.id,
endpoint="/v1/chat/completions",
completion_window="24h",
)
# 4. Poll
while batch.status not in ("completed", "failed", "expired", "cancelled"):
time.sleep(60)
batch = client.batches.retrieve(batch.id)
print(f"{batch.status}: {batch.request_counts.completed}/{batch.request_counts.total}")
if batch.status != "completed":
raise SystemExit(f"Batch ended with status {batch.status}")
# 5. Download, parse, write CSV
output = client.files.content(batch.output_file_id).read().decode()
results = {
rec["custom_id"]: rec["response"]["body"]["choices"][0]["message"]["content"]
for rec in (json.loads(line) for line in output.splitlines() if line.strip())
}
with open("results.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerow(["doc_id", "answer"])
writer.writerows(results.items())
For larger jobs, also read error_file_id and resubmit only the failed
lines, see Errors reference for the recovery pattern.
Next steps
JSONL format →
Exact line schema for input, output, and error files.
Supported endpoints →
Body and response shape for /v1/chat/completions.
Objects & lifecycle →
Status lifecycle, the full Batch object schema, and every endpoint.
Errors reference →
Recover from failed lines without re-running the whole batch.