Production patterns

Keep API keys in environment variables, never in source or client-side code. Handle the common status codes explicitly: 401 (bad API key), 403 (bad project ID), and 429 (rate limit). Set sensible timeouts and retries with backoff on 429 and 5xx. For large, non-real-time jobs, use the Batch and Files API instead of the synchronous endpoint - streaming is not supported in batch mode. Already on the OpenAI SDK? The drop-in client shown in the Introduction is the recommended way to call ZeroGPU from application code.

Keep secrets out of source

Read your x-api-key and x-project-id from the environment (or a secrets manager) and inject them at deploy time. Never commit them, never ship them in a browser bundle or mobile app - a key embedded in client-side code is a public key.

ZeroGPU calls authenticate from your backend. If you need to call from a browser or mobile client, proxy the request through a server you control so the key stays server-side.

# .env (git-ignored) - load with your process manager or dotenv
export ZEROGPU_API_KEY="zgpu-..."
export ZEROGPU_PROJECT_ID="..."

Handle status codes explicitly

Branch on the status code. Authentication and authorization errors are permanent - retrying them just burns time and quota. Rate limits and server errors are transient - those are the ones to retry.

Status	Meaning	What to do
`200`	Success	Parse and use the response.
`400`	Bad request	Fix the request body. Do not retry.
`401`	Bad API key	Check `x-api-key`. Do not retry.
`403`	Bad project ID	Check `x-project-id` and permissions. Do not retry.
`420`	Input over token limit	Shorten the input. Do not retry unchanged.
`429`	Rate limited	Back off and retry. Honor `Retry-After`.
`5xx`	Server error	Retry with exponential backoff.

Treat 408 (request timeout) and 409 (conflict) the same as 5xx for retry purposes. Network errors and client-side timeouts are retriable too.

Set timeouts and retries

Three rules cover almost every case:

Set a per-request timeout so a stalled connection can’t hang your worker.
Retry only the transient codes (408, 429, 5xx) and network failures - never 401, 403, or 400.
Back off exponentially with jitter, cap the delay, cap the attempts, and honor the Retry-After header on 429.

Using the OpenAI SDK (recommended)

If you call ZeroGPU through the drop-in OpenAI client, timeouts and retries are built in - set timeout and max_retries once on the client. The SDK retries 408, 409, 429, and 5xx with exponential backoff and respects Retry-After automatically.

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.zerogpu.ai/v1",
    api_key="unused",  # ZeroGPU authenticates via the headers below
    default_headers={
        "x-api-key": os.environ["ZEROGPU_API_KEY"],
        "x-project-id": os.environ["ZEROGPU_PROJECT_ID"],
    },
    timeout=30.0,    # seconds, per request
    max_retries=5,   # exponential backoff on 408 / 409 / 429 / 5xx
)

resp = client.responses.create(
    model="llama-3.1-8b-instruct-fast",
    input="Your input text here...",
)
print(resp.output)

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.zerogpu.ai/v1',
  apiKey: 'unused', // ZeroGPU authenticates via the headers below
  defaultHeaders: {
    'x-api-key': process.env.ZEROGPU_API_KEY,
    'x-project-id': process.env.ZEROGPU_PROJECT_ID,
  },
  timeout: 30_000, // ms, per request
  maxRetries: 5,   // exponential backoff on 408 / 409 / 429 / 5xx
});

const resp = await client.responses.create({
  model: 'llama-3.1-8b-instruct-fast',
  input: 'Your input text here...',
});
console.log(resp.output);

You can override either value per request, for example a longer timeout on a heavy call: client.responses.create(..., timeout=60.0).

Rolling your own

When you call the HTTP API directly, implement the loop yourself: a per-request timeout, a retriable-status check, and exponential backoff with jitter that honors Retry-After.

# curl's built-in --retry handles 408/429/500/502/503/504 with exponential
# backoff and honors Retry-After. It does NOT retry 401/403/400 - exactly right.
curl --retry 5 --retry-delay 1 --max-time 30 \
  https://api.zerogpu.ai/v1/responses \
  -H "content-type: application/json" \
  -H "x-api-key: $ZEROGPU_API_KEY" \
  -H "x-project-id: $ZEROGPU_PROJECT_ID" \
  -d '{
    "model": "llama-3.1-8b-instruct-fast",
    "input": "Your input text here..."
  }'

import os
import random
import time
import requests

URL = "https://api.zerogpu.ai/v1/responses"
RETRIABLE = {408, 429, 500, 502, 503, 504}
HEADERS = {
    "content-type": "application/json",
    "x-api-key": os.environ["ZEROGPU_API_KEY"],
    "x-project-id": os.environ["ZEROGPU_PROJECT_ID"],
}


def create_response(payload, max_retries=5, timeout=30):
    for attempt in range(max_retries + 1):
        retry_after = None
        try:
            resp = requests.post(URL, json=payload, headers=HEADERS, timeout=timeout)
            if resp.status_code not in RETRIABLE:
                resp.raise_for_status()  # raises on 401 / 403 / 400 - no retry
                return resp.json()
            if attempt == max_retries:
                resp.raise_for_status()
            retry_after = resp.headers.get("retry-after")
        except requests.exceptions.RequestException:
            if attempt == max_retries:
                raise

        # exponential backoff with jitter; honor Retry-After on 429
        delay = float(retry_after) if retry_after else min(2 ** attempt, 30) + random.random()
        time.sleep(delay)

const URL = 'https://api.zerogpu.ai/v1/responses';
const RETRIABLE = new Set([408, 429, 500, 502, 503, 504]);
const sleep = (ms) => new Promise((r) => setTimeout(r, ms));

async function createResponse(payload, { maxRetries = 5, timeoutMs = 30_000 } = {}) {
  const headers = {
    'content-type': 'application/json',
    'x-api-key': process.env.ZEROGPU_API_KEY,
    'x-project-id': process.env.ZEROGPU_PROJECT_ID,
  };

  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    const controller = new AbortController();
    const timer = setTimeout(() => controller.abort(), timeoutMs);
    try {
      const res = await fetch(URL, {
        method: 'POST',
        headers,
        body: JSON.stringify(payload),
        signal: controller.signal,
      });

      if (!RETRIABLE.has(res.status)) {
        if (!res.ok) throw new Error(`ZeroGPU ${res.status}: ${await res.text()}`); // 401 / 403 / 400
        return res.json();
      }
      if (attempt === maxRetries) throw new Error(`ZeroGPU ${res.status} after ${maxRetries} retries`);

      // exponential backoff with jitter; honor Retry-After on 429
      const retryAfter = res.headers.get('retry-after');
      const delay = retryAfter
        ? Number(retryAfter) * 1000
        : Math.min(2 ** attempt * 1000, 30_000) + Math.random() * 1000;
      await sleep(delay);
    } catch (err) {
      if (attempt === maxRetries) throw err;
      await sleep(Math.min(2 ** attempt * 1000, 30_000) + Math.random() * 1000);
    } finally {
      clearTimeout(timer);
    }
  }
}

use reqwest::Client;
use serde_json::Value;
use std::time::Duration;
use tokio::time::sleep;

const RETRIABLE: [u16; 6] = [408, 429, 500, 502, 503, 504];

async fn create_response(payload: Value, max_retries: u32) -> Result<Value, Box<dyn std::error::Error>> {
    let client = Client::builder().timeout(Duration::from_secs(30)).build()?;
    let api_key = std::env::var("ZEROGPU_API_KEY")?;
    let project_id = std::env::var("ZEROGPU_PROJECT_ID")?;

    for attempt in 0..=max_retries {
        let mut retry_after: Option<u64> = None;

        match client
            .post("https://api.zerogpu.ai/v1/responses")
            .header("content-type", "application/json")
            .header("x-api-key", &api_key)
            .header("x-project-id", &project_id)
            .json(&payload)
            .send()
            .await
        {
            Ok(resp) => {
                let status = resp.status();
                if !RETRIABLE.contains(&status.as_u16()) {
                    if !status.is_success() {
                        // 401 / 403 / 400 - permanent, do not retry
                        return Err(format!("zerogpu {}: {}", status, resp.text().await?).into());
                    }
                    return Ok(resp.json().await?);
                }
                if attempt == max_retries {
                    return Err(format!("zerogpu {} after {} retries", status, max_retries).into());
                }
                retry_after = resp
                    .headers()
                    .get("retry-after")
                    .and_then(|v| v.to_str().ok())
                    .and_then(|v| v.parse().ok());
            }
            Err(err) => {
                if attempt == max_retries {
                    return Err(err.into());
                }
            }
        }

        // exponential backoff; honor Retry-After on 429
        let backoff = retry_after.unwrap_or_else(|| 2u64.pow(attempt).min(30));
        sleep(Duration::from_secs(backoff)).await;
    }
    unreachable!()
}

package main

import (
	"bytes"
	"encoding/json"
	"fmt"
	"io"
	"math"
	"math/rand"
	"net/http"
	"os"
	"strconv"
	"time"
)

var retriable = map[int]bool{408: true, 429: true, 500: true, 502: true, 503: true, 504: true}

func createResponse(payload map[string]any, maxRetries int) ([]byte, error) {
	body, _ := json.Marshal(payload)
	client := &http.Client{Timeout: 30 * time.Second}

	for attempt := 0; ; attempt++ {
		req, _ := http.NewRequest("POST", "https://api.zerogpu.ai/v1/responses", bytes.NewReader(body))
		req.Header.Set("content-type", "application/json")
		req.Header.Set("x-api-key", os.Getenv("ZEROGPU_API_KEY"))
		req.Header.Set("x-project-id", os.Getenv("ZEROGPU_PROJECT_ID"))

		res, err := client.Do(req)
		if err == nil && !retriable[res.StatusCode] {
			defer res.Body.Close()
			data, _ := io.ReadAll(res.Body)
			if res.StatusCode >= 400 { // 401 / 403 / 400 - permanent
				return nil, fmt.Errorf("zerogpu %d: %s", res.StatusCode, data)
			}
			return data, nil
		}
		if attempt >= maxRetries {
			if err != nil {
				return nil, err
			}
			return nil, fmt.Errorf("zerogpu %d after %d retries", res.StatusCode, maxRetries)
		}

		// exponential backoff with jitter; honor Retry-After on 429
		delay := time.Duration(math.Min(math.Pow(2, float64(attempt)), 30))*time.Second +
			time.Duration(rand.Intn(1000))*time.Millisecond
		if err == nil {
			if ra := res.Header.Get("Retry-After"); ra != "" {
				if secs, e := strconv.Atoi(ra); e == nil {
					delay = time.Duration(secs) * time.Second
				}
			}
			res.Body.Close()
		}
		time.Sleep(delay)
	}
}

require 'net/http'
require 'uri'
require 'json'

ENDPOINT = URI('https://api.zerogpu.ai/v1/responses')
RETRIABLE = [408, 429, 500, 502, 503, 504].freeze

def create_response(payload, max_retries: 5, timeout: 30)
  http = Net::HTTP.new(ENDPOINT.host, ENDPOINT.port)
  http.use_ssl = true
  http.open_timeout = timeout
  http.read_timeout = timeout

  request = Net::HTTP::Post.new(ENDPOINT.request_uri)
  request['content-type'] = 'application/json'
  request['x-api-key'] = ENV.fetch('ZEROGPU_API_KEY')
  request['x-project-id'] = ENV.fetch('ZEROGPU_PROJECT_ID')
  request.body = payload.to_json

  (0..max_retries).each do |attempt|
    delay =
      begin
        response = http.request(request)
        status = response.code.to_i
        unless RETRIABLE.include?(status)
          raise "ZeroGPU #{status}: #{response.body}" if status >= 400 # 401 / 403 / 400

          return JSON.parse(response.body)
        end
        raise "ZeroGPU #{status} after #{max_retries} retries" if attempt == max_retries

        # exponential backoff with jitter; honor Retry-After on 429
        response['retry-after']&.to_f || [2**attempt, 30].min + rand
      rescue Net::OpenTimeout, Net::ReadTimeout
        raise if attempt == max_retries

        [2**attempt, 30].min + rand
      end

    sleep(delay)
  end
end

Use the Batch API for large jobs

For large, non-real-time workloads, the Batch and Files API is the right tool instead of looping over the synchronous endpoint. It processes up to 50,000 requests within a 24-hour window at a discounted rate and sidesteps per-request rate limits entirely - so you don’t need a retry loop at all.

You need…	Use
A single immediate response	The synchronous endpoint (`POST /v1/responses`) with the retry loop above
Thousands of completions, can wait minutes-to-hours	The Batch API
To avoid per-second rate limits during a backfill	The Batch API
Streaming responses	The synchronous endpoint - streaming is not supported in batch mode

Get Started

Models

Guides

Platform

Production patterns

Keep secrets out of source

Handle status codes explicitly

Set timeouts and retries

Using the OpenAI SDK (recommended)

Rolling your own

Use the Batch API for large jobs

Next steps

Batch & Files API

Authentication

API Reference

​Keep secrets out of source

​Handle status codes explicitly

​Set timeouts and retries

​Using the OpenAI SDK (recommended)

​Rolling your own

​Use the Batch API for large jobs

​Next steps

Batch & Files API

Authentication

API Reference

Keep secrets out of source

Handle status codes explicitly

Set timeouts and retries

Using the OpenAI SDK (recommended)

Rolling your own

Use the Batch API for large jobs

Next steps