Quickstart

This page walks through a complete batch from start to finish in both curl and the Python openai SDK. By the end you’ll have submitted 3 chat completions through the Batch API and read the results.

Prerequisites

Credential	Where to find it
API key (`x-api-key`)	ZeroGPU dashboard → API Keys
Project ID (`x-project-id`)	ZeroGPU dashboard → Projects

Both headers are required on every request. Missing either returns 401.

Keep your API key out of source controlStore it in environment variables, secret managers, or your CI’s secret store, never commit it.

export ZGPU_API_KEY="your-api-key"
export ZGPU_PROJECT_ID="your-project-uuid"

Base URLs

Environment	URL
Production	`https://api.zerogpu.ai`
Staging	`https://staging.api.zerogpu.ai`
Development	`https://dev.api.zerogpu.ai`

The Batch and Files endpoints live under these hostnames:

POST /v1/files, GET /v1/files, GET /v1/files/{id}, GET /v1/files/{id}/content, DELETE /v1/files/{id}
POST /v1/batches, GET /v1/batches, GET /v1/batches/{batch_id}

1. Build the input JSONL

Every batch is driven by a JSONL file where each line is one inference request:

{"custom_id": "req-1", "method": "POST", "url": "/v1/chat/completions", "body": { ... }}

Field	Required	Description
`custom_id`	Yes	Your identifier for the request. Must be unique within the batch. Echoed back in the output so you can match results to inputs.
`method`	Yes	Must be `"POST"`.
`url`	Yes	Must be `/v1/chat/completions`, the only supported batch endpoint. All lines in a batch must share the same `url`.
`body`	Yes	The JSON body you would send to that endpoint synchronously. `"stream": true` is rejected, batches are non-streaming.

Full schema and validation rules: JSONL format.

cat > input.jsonl <<'EOF'
{"custom_id":"q-1","method":"POST","url":"/v1/chat/completions","body":{"model":"<model-id>","messages":[{"role":"user","content":"What is the capital of France?"}]}}
{"custom_id":"q-2","method":"POST","url":"/v1/chat/completions","body":{"model":"<model-id>","messages":[{"role":"user","content":"What is the capital of Germany?"}]}}
{"custom_id":"q-3","method":"POST","url":"/v1/chat/completions","body":{"model":"<model-id>","messages":[{"role":"user","content":"What is the capital of Italy?"}]}}
EOF

import json

questions = [
    ("q-1", "What is the capital of France?"),
    ("q-2", "What is the capital of Germany?"),
    ("q-3", "What is the capital of Italy?"),
]

with open("input.jsonl", "w") as f:
    for custom_id, question in questions:
        f.write(json.dumps({
            "custom_id": custom_id,
            "method":    "POST",
            "url":       "/v1/chat/completions",
            "body": {
                "model":    "<model-id>",
                "messages": [{"role": "user", "content": question}],
            },
        }) + "\n")

2. Upload the file

Send the JSONL to POST /v1/files with purpose=batch. The response contains the file_id you’ll reference when creating the batch.

curl -X POST https://api.zerogpu.ai/v1/files \
  -H "x-api-key: $ZGPU_API_KEY" \
  -H "x-project-id: $ZGPU_PROJECT_ID" \
  -F purpose=batch \
  -F [email protected]

from openai import OpenAI

client = OpenAI(
    api_key="ignored-by-zerogpu",
    base_url="https://api.zerogpu.ai/v1",
    default_headers={
        "x-api-key":    os.environ["ZGPU_API_KEY"],
        "x-project-id": os.environ["ZGPU_PROJECT_ID"],
    },
)

uploaded = client.files.create(
    file=open("input.jsonl", "rb"),
    purpose="batch",
)
print(uploaded.id)  # file-abc...

Response:

{
  "id":         "file-abc123...",
  "object":     "file",
  "bytes":      612,
  "created_at": 1736290000,
  "filename":   "input.jsonl",
  "purpose":    "batch",
  "status":     "processed",
  "expires_at": 1738882000
}

3. Create the batch

Submit the batch with the file ID, the target endpoint, and a 24-hour completion window. The response returns immediately with status: "in_progress", actual processing is asynchronous.

curl -X POST https://api.zerogpu.ai/v1/batches \
  -H "x-api-key: $ZGPU_API_KEY" \
  -H "x-project-id: $ZGPU_PROJECT_ID" \
  -H "content-type: application/json" \
  -d '{
    "input_file_id":     "file-abc123...",
    "endpoint":          "/v1/chat/completions",
    "completion_window": "24h"
  }'

batch = client.batches.create(
    input_file_id=uploaded.id,
    endpoint="/v1/chat/completions",
    completion_window="24h",
    metadata={"job": "capitals-demo"},
)
print(batch.id, batch.status)  # batch_01HZX... in_progress

Response:

{
  "id":             "batch_01HZX...",
  "object":         "batch",
  "endpoint":       "/v1/chat/completions",
  "status":         "in_progress",
  "input_file_id":  "file-abc123...",
  "output_file_id": null,
  "error_file_id":  null,
  "created_at":     1736290000,
  "expires_at":     1736376400,
  "request_counts": { "total": 3, "completed": 0, "failed": 0 }
}

Validation runs at create timeThe server streams the entire input JSONL and validates every line before responding. If anything is wrong, duplicate custom_id, line over 1 MB, stream: true, mismatched url, you’ll get a 400 with the offending line. Once the response returns, the batch is durably committed.

4. Poll until complete

Poll GET /v1/batches/{batch_id} until status is completed, expired, or failed. A 30-second interval is a reasonable default.

while true; do
  RESP=$(curl -s "https://api.zerogpu.ai/v1/batches/$BATCH_ID" \
    -H "x-api-key: $ZGPU_API_KEY" \
    -H "x-project-id: $ZGPU_PROJECT_ID")
  STATUS=$(echo "$RESP" | jq -r '.status')
  echo "$RESP" | jq -c '{status, request_counts}'
  case "$STATUS" in completed|failed|expired) break;; esac
  sleep 30
done

import time

while batch.status not in ("completed", "failed", "expired"):
    time.sleep(30)
    batch = client.batches.retrieve(batch.id)
    print(batch.status, batch.request_counts)

When status is completed, the response contains output_file_id (and, if any line failed, error_file_id).

5. Download the results

Stream the output and error files via GET /v1/files/{file_id}/content.

curl -s "https://api.zerogpu.ai/v1/files/$OUTPUT_FILE_ID/content" \
  -H "x-api-key: $ZGPU_API_KEY" \
  -H "x-project-id: $ZGPU_PROJECT_ID" \
  -o output.jsonl

jq -r '"\(.custom_id): \(.response.body.choices[0].message.content)"' output.jsonl

if batch.output_file_id:
    data = client.files.content(batch.output_file_id).read()
    with open("output.jsonl", "wb") as f:
        f.write(data)

    for line in data.decode().splitlines():
        rec = json.loads(line)
        print(rec["custom_id"], rec["response"]["body"]["choices"][0]["message"]["content"])

Output line shape (one per successful line, order not preserved, match by custom_id):

{"id": "batch_req_a", "custom_id": "q-1", "response": {"status_code": 200, "request_id": "req_xyz", "body": { ... }}}

Error line shape (one per failed line):

{"id": "batch_req_b", "custom_id": "q-2", "response": null, "error": {"code": "invalid_request_error", "message": "[legacy:http_400] ...", "param": null}}

Full schemas: JSONL format.

Scale it up

The same five steps handle a real workload. This script sends 1,000 prompts and collects the answers into a CSV, generating the JSONL from your own data, polling once a minute, and matching results back by custom_id:

import csv, json, os, time
from openai import OpenAI

client = OpenAI(
    api_key="ignored-by-zerogpu",
    base_url="https://api.zerogpu.ai/v1",
    default_headers={
        "x-api-key":    os.environ["ZGPU_API_KEY"],
        "x-project-id": os.environ["ZGPU_PROJECT_ID"],
    },
)

# 1. Write JSONL from your data (anything yielding (doc_id, prompt) tuples)
prompts = load_my_dataset()  # 1000 records
with open("chat.jsonl", "w") as f:
    for doc_id, prompt in prompts:
        f.write(json.dumps({
            "custom_id": doc_id,
            "method":    "POST",
            "url":       "/v1/chat/completions",
            "body":      {"model": "<model-id>", "messages": [{"role": "user", "content": prompt}]},
        }) + "\n")

# 2-3. Upload + create
uploaded = client.files.create(file=open("chat.jsonl", "rb"), purpose="batch")
batch    = client.batches.create(
    input_file_id=uploaded.id,
    endpoint="/v1/chat/completions",
    completion_window="24h",
)

# 4. Poll
while batch.status not in ("completed", "failed", "expired", "cancelled"):
    time.sleep(60)
    batch = client.batches.retrieve(batch.id)
    print(f"{batch.status}: {batch.request_counts.completed}/{batch.request_counts.total}")

if batch.status != "completed":
    raise SystemExit(f"Batch ended with status {batch.status}")

# 5. Download, parse, write CSV
output = client.files.content(batch.output_file_id).read().decode()
results = {
    rec["custom_id"]: rec["response"]["body"]["choices"][0]["message"]["content"]
    for rec in (json.loads(line) for line in output.splitlines() if line.strip())
}
with open("results.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerow(["doc_id", "answer"])
    writer.writerows(results.items())

For larger jobs, also read error_file_id and resubmit only the failed lines, see Errors reference for the recovery pattern.

Next steps

JSONL format →

Exact line schema for input, output, and error files.

Supported endpoints →

Body and response shape for /v1/chat/completions.

Objects & lifecycle →

Status lifecycle, the full Batch object schema, and every endpoint.

Errors reference →

Recover from failed lines without re-running the whole batch.

Get Started

Models

Guides

Platform

Prerequisites

Base URLs

1. Build the input JSONL

2. Upload the file

3. Create the batch

4. Poll until complete

5. Download the results

Scale it up

Next steps

JSONL format →

Supported endpoints →

Objects & lifecycle →

Errors reference →

​Prerequisites

​Base URLs

​1. Build the input JSONL

​2. Upload the file

​3. Create the batch

​4. Poll until complete

​5. Download the results

​Scale it up

​Next steps

JSONL format →

Supported endpoints →

Objects & lifecycle →

Errors reference →

Prerequisites

Base URLs

1. Build the input JSONL

2. Upload the file

3. Create the batch

4. Poll until complete

5. Download the results

Scale it up

Next steps