Batch requests - ZeroGPU

When you need to process many items (e.g. summarize 100 articles or classify a list of snippets), send requests in parallel and handle errors so one failure doesn’t block the rest.

Pattern: parallel requests with a pool

import requests
from concurrent.futures import ThreadPoolExecutor, as_completed

url = "https://api.zerogpu.ai/v1/responses"
headers = {
    "content-type": "application/json",
    "x-api-key": "YOUR_API_KEY",
    "x-project-id": "YOUR_PROJECT_ID",
}

def one_request(content: str, model: str):
    payload = {
        "model": model,
        "input": [{"role": "user", "content": content}],
        "text": {"format": {"type": "text"}},
    }
    r = requests.post(url, headers=headers, json=payload)
    r.raise_for_status()
    return r.json()

texts = ["First text...", "Second text..."]  # your inputs
model = "zlm-v1-summary-cloud"  # or zlm-v1-iab-classify-cloud
results = []
with ThreadPoolExecutor(max_workers=5) as executor:
    futures = {executor.submit(one_request, t, model): t for t in texts}
    for future in as_completed(futures):
        try:
            results.append(future.result())
        except requests.RequestException as e:
            # log and skip or retry
            print(f"Failed: {e}")

Tips

Concurrency: Tune max_workers to match your usage and dashboard metrics.
Errors: Check status codes and response body; retry with backoff on 5xx.
Credentials: Use env vars for x-api-key and x-project-id; see Security.

For a single request shape, see Summarize text or IAB classification. For a runnable Node.js batch demo, see the Batch requests demo in the cookbook.

Cookbook

Documentation Index

​Pattern: parallel requests with a pool

​Tips

Pattern: parallel requests with a pool

Tips