Skip to main content
Keep API keys in environment variables, never in source or client-side code. Handle the common status codes explicitly: 401 (bad API key), 403 (bad project ID), and 429 (rate limit). Set sensible timeouts and retries with backoff on 429 and 5xx. For large, non-real-time jobs, use the Batch and Files API instead of the synchronous endpoint - streaming is not supported in batch mode. Already on the OpenAI SDK? The drop-in client shown in the Introduction is the recommended way to call ZeroGPU from application code.

Keep secrets out of source

Read your x-api-key and x-project-id from the environment (or a secrets manager) and inject them at deploy time. Never commit them, never ship them in a browser bundle or mobile app - a key embedded in client-side code is a public key.
ZeroGPU calls authenticate from your backend. If you need to call from a browser or mobile client, proxy the request through a server you control so the key stays server-side.
# .env (git-ignored) - load with your process manager or dotenv
export ZEROGPU_API_KEY="zgpu-..."
export ZEROGPU_PROJECT_ID="..."

Handle status codes explicitly

Branch on the status code. Authentication and authorization errors are permanent - retrying them just burns time and quota. Rate limits and server errors are transient - those are the ones to retry.
StatusMeaningWhat to do
200SuccessParse and use the response.
400Bad requestFix the request body. Do not retry.
401Bad API keyCheck x-api-key. Do not retry.
403Bad project IDCheck x-project-id and permissions. Do not retry.
420Input over token limitShorten the input. Do not retry unchanged.
429Rate limitedBack off and retry. Honor Retry-After.
5xxServer errorRetry with exponential backoff.
Treat 408 (request timeout) and 409 (conflict) the same as 5xx for retry purposes. Network errors and client-side timeouts are retriable too.

Set timeouts and retries

Three rules cover almost every case:
  1. Set a per-request timeout so a stalled connection can’t hang your worker.
  2. Retry only the transient codes (408, 429, 5xx) and network failures - never 401, 403, or 400.
  3. Back off exponentially with jitter, cap the delay, cap the attempts, and honor the Retry-After header on 429.
If you call ZeroGPU through the drop-in OpenAI client, timeouts and retries are built in - set timeout and max_retries once on the client. The SDK retries 408, 409, 429, and 5xx with exponential backoff and respects Retry-After automatically.
import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.zerogpu.ai/v1",
    api_key="unused",  # ZeroGPU authenticates via the headers below
    default_headers={
        "x-api-key": os.environ["ZEROGPU_API_KEY"],
        "x-project-id": os.environ["ZEROGPU_PROJECT_ID"],
    },
    timeout=30.0,    # seconds, per request
    max_retries=5,   # exponential backoff on 408 / 409 / 429 / 5xx
)

resp = client.responses.create(
    model="llama-3.1-8b-instruct-fast",
    input="Your input text here...",
)
print(resp.output)
You can override either value per request, for example a longer timeout on a heavy call: client.responses.create(..., timeout=60.0).

Rolling your own

When you call the HTTP API directly, implement the loop yourself: a per-request timeout, a retriable-status check, and exponential backoff with jitter that honors Retry-After.
# curl's built-in --retry handles 408/429/500/502/503/504 with exponential
# backoff and honors Retry-After. It does NOT retry 401/403/400 - exactly right.
curl --retry 5 --retry-delay 1 --max-time 30 \
  https://api.zerogpu.ai/v1/responses \
  -H "content-type: application/json" \
  -H "x-api-key: $ZEROGPU_API_KEY" \
  -H "x-project-id: $ZEROGPU_PROJECT_ID" \
  -d '{
    "model": "llama-3.1-8b-instruct-fast",
    "input": "Your input text here..."
  }'

Use the Batch API for large jobs

For large, non-real-time workloads, the Batch and Files API is the right tool instead of looping over the synchronous endpoint. It processes up to 50,000 requests within a 24-hour window at a discounted rate and sidesteps per-request rate limits entirely - so you don’t need a retry loop at all.
You need…Use
A single immediate responseThe synchronous endpoint (POST /v1/responses) with the retry loop above
Thousands of completions, can wait minutes-to-hoursThe Batch API
To avoid per-second rate limits during a backfillThe Batch API
Streaming responsesThe synchronous endpoint - streaming is not supported in batch mode

Next steps

Batch & Files API

Authentication

API Reference