> ## Documentation Index
> Fetch the complete documentation index at: https://docs.zerogpu.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Production patterns

> What to get right when you move ZeroGPU from a first call into production.

Keep API keys in environment variables, never in source or client-side code. Handle the common status codes explicitly: 401 (bad API key), 403 (bad project ID), and 429 (rate limit). Set sensible timeouts and retries with backoff on 429 and 5xx. For large, non-real-time jobs, use the Batch and Files API instead of the synchronous endpoint - streaming is not supported in batch mode. Already on the OpenAI SDK? The drop-in client shown in the [Introduction](/) is the recommended way to call ZeroGPU from application code.

## Keep secrets out of source

Read your `x-api-key` and `x-project-id` from the environment (or a secrets manager) and inject them at deploy time. Never commit them, never ship them in a browser bundle or mobile app - a key embedded in client-side code is a public key.

<Warning>
  ZeroGPU calls authenticate from your backend. If you need to call from a browser or mobile client, proxy the request through a server you control so the key stays server-side.
</Warning>

```bash theme={null}
# .env (git-ignored) - load with your process manager or dotenv
export ZEROGPU_API_KEY="zgpu-..."
export ZEROGPU_PROJECT_ID="..."
```

## Handle status codes explicitly

Branch on the status code. Authentication and authorization errors are permanent - retrying them just burns time and quota. Rate limits and server errors are transient - those are the ones to retry.

| Status | Meaning                | What to do                                              |
| ------ | ---------------------- | ------------------------------------------------------- |
| `200`  | Success                | Parse and use the response.                             |
| `400`  | Bad request            | Fix the request body. Do **not** retry.                 |
| `401`  | Bad API key            | Check `x-api-key`. Do **not** retry.                    |
| `403`  | Bad project ID         | Check `x-project-id` and permissions. Do **not** retry. |
| `420`  | Input over token limit | Shorten the input. Do **not** retry unchanged.          |
| `429`  | Rate limited           | Back off and retry. Honor `Retry-After`.                |
| `5xx`  | Server error           | Retry with exponential backoff.                         |

<Note>
  Treat `408` (request timeout) and `409` (conflict) the same as `5xx` for retry purposes. Network errors and client-side timeouts are retriable too.
</Note>

## Set timeouts and retries

Three rules cover almost every case:

1. **Set a per-request timeout** so a stalled connection can't hang your worker.
2. **Retry only the transient codes** (`408`, `429`, `5xx`) and network failures - never `401`, `403`, or `400`.
3. **Back off exponentially with jitter**, cap the delay, cap the attempts, and honor the `Retry-After` header on `429`.

### Using the OpenAI SDK (recommended)

If you call ZeroGPU through the [drop-in OpenAI client](/), timeouts and retries are built in - set `timeout` and `max_retries` once on the client. The SDK retries `408`, `409`, `429`, and `5xx` with exponential backoff and respects `Retry-After` automatically.

<CodeGroup>
  ```python Python wrap theme={null}
  import os
  from openai import OpenAI

  client = OpenAI(
      base_url="https://api.zerogpu.ai/v1",
      api_key="unused",  # ZeroGPU authenticates via the headers below
      default_headers={
          "x-api-key": os.environ["ZEROGPU_API_KEY"],
          "x-project-id": os.environ["ZEROGPU_PROJECT_ID"],
      },
      timeout=30.0,    # seconds, per request
      max_retries=5,   # exponential backoff on 408 / 409 / 429 / 5xx
  )

  resp = client.responses.create(
      model="llama-3.1-8b-instruct-fast",
      input="Your input text here...",
  )
  print(resp.output)
  ```

  ```javascript JavaScript wrap theme={null}
  import OpenAI from 'openai';

  const client = new OpenAI({
    baseURL: 'https://api.zerogpu.ai/v1',
    apiKey: 'unused', // ZeroGPU authenticates via the headers below
    defaultHeaders: {
      'x-api-key': process.env.ZEROGPU_API_KEY,
      'x-project-id': process.env.ZEROGPU_PROJECT_ID,
    },
    timeout: 30_000, // ms, per request
    maxRetries: 5,   // exponential backoff on 408 / 409 / 429 / 5xx
  });

  const resp = await client.responses.create({
    model: 'llama-3.1-8b-instruct-fast',
    input: 'Your input text here...',
  });
  console.log(resp.output);
  ```
</CodeGroup>

You can override either value per request, for example a longer timeout on a heavy call: `client.responses.create(..., timeout=60.0)`.

### Rolling your own

When you call the HTTP API directly, implement the loop yourself: a per-request timeout, a retriable-status check, and exponential backoff with jitter that honors `Retry-After`.

<CodeGroup>
  ```bash cURL wrap theme={null}
  # curl's built-in --retry handles 408/429/500/502/503/504 with exponential
  # backoff and honors Retry-After. It does NOT retry 401/403/400 - exactly right.
  curl --retry 5 --retry-delay 1 --max-time 30 \
    https://api.zerogpu.ai/v1/responses \
    -H "content-type: application/json" \
    -H "x-api-key: $ZEROGPU_API_KEY" \
    -H "x-project-id: $ZEROGPU_PROJECT_ID" \
    -d '{
      "model": "llama-3.1-8b-instruct-fast",
      "input": "Your input text here..."
    }'
  ```

  ```python Python wrap theme={null}
  import os
  import random
  import time
  import requests

  URL = "https://api.zerogpu.ai/v1/responses"
  RETRIABLE = {408, 429, 500, 502, 503, 504}
  HEADERS = {
      "content-type": "application/json",
      "x-api-key": os.environ["ZEROGPU_API_KEY"],
      "x-project-id": os.environ["ZEROGPU_PROJECT_ID"],
  }


  def create_response(payload, max_retries=5, timeout=30):
      for attempt in range(max_retries + 1):
          retry_after = None
          try:
              resp = requests.post(URL, json=payload, headers=HEADERS, timeout=timeout)
              if resp.status_code not in RETRIABLE:
                  resp.raise_for_status()  # raises on 401 / 403 / 400 - no retry
                  return resp.json()
              if attempt == max_retries:
                  resp.raise_for_status()
              retry_after = resp.headers.get("retry-after")
          except requests.exceptions.RequestException:
              if attempt == max_retries:
                  raise

          # exponential backoff with jitter; honor Retry-After on 429
          delay = float(retry_after) if retry_after else min(2 ** attempt, 30) + random.random()
          time.sleep(delay)
  ```

  ```javascript JavaScript wrap theme={null}
  const URL = 'https://api.zerogpu.ai/v1/responses';
  const RETRIABLE = new Set([408, 429, 500, 502, 503, 504]);
  const sleep = (ms) => new Promise((r) => setTimeout(r, ms));

  async function createResponse(payload, { maxRetries = 5, timeoutMs = 30_000 } = {}) {
    const headers = {
      'content-type': 'application/json',
      'x-api-key': process.env.ZEROGPU_API_KEY,
      'x-project-id': process.env.ZEROGPU_PROJECT_ID,
    };

    for (let attempt = 0; attempt <= maxRetries; attempt++) {
      const controller = new AbortController();
      const timer = setTimeout(() => controller.abort(), timeoutMs);
      try {
        const res = await fetch(URL, {
          method: 'POST',
          headers,
          body: JSON.stringify(payload),
          signal: controller.signal,
        });

        if (!RETRIABLE.has(res.status)) {
          if (!res.ok) throw new Error(`ZeroGPU ${res.status}: ${await res.text()}`); // 401 / 403 / 400
          return res.json();
        }
        if (attempt === maxRetries) throw new Error(`ZeroGPU ${res.status} after ${maxRetries} retries`);

        // exponential backoff with jitter; honor Retry-After on 429
        const retryAfter = res.headers.get('retry-after');
        const delay = retryAfter
          ? Number(retryAfter) * 1000
          : Math.min(2 ** attempt * 1000, 30_000) + Math.random() * 1000;
        await sleep(delay);
      } catch (err) {
        if (attempt === maxRetries) throw err;
        await sleep(Math.min(2 ** attempt * 1000, 30_000) + Math.random() * 1000);
      } finally {
        clearTimeout(timer);
      }
    }
  }
  ```

  ```rust Rust wrap theme={null}
  use reqwest::Client;
  use serde_json::Value;
  use std::time::Duration;
  use tokio::time::sleep;

  const RETRIABLE: [u16; 6] = [408, 429, 500, 502, 503, 504];

  async fn create_response(payload: Value, max_retries: u32) -> Result<Value, Box<dyn std::error::Error>> {
      let client = Client::builder().timeout(Duration::from_secs(30)).build()?;
      let api_key = std::env::var("ZEROGPU_API_KEY")?;
      let project_id = std::env::var("ZEROGPU_PROJECT_ID")?;

      for attempt in 0..=max_retries {
          let mut retry_after: Option<u64> = None;

          match client
              .post("https://api.zerogpu.ai/v1/responses")
              .header("content-type", "application/json")
              .header("x-api-key", &api_key)
              .header("x-project-id", &project_id)
              .json(&payload)
              .send()
              .await
          {
              Ok(resp) => {
                  let status = resp.status();
                  if !RETRIABLE.contains(&status.as_u16()) {
                      if !status.is_success() {
                          // 401 / 403 / 400 - permanent, do not retry
                          return Err(format!("zerogpu {}: {}", status, resp.text().await?).into());
                      }
                      return Ok(resp.json().await?);
                  }
                  if attempt == max_retries {
                      return Err(format!("zerogpu {} after {} retries", status, max_retries).into());
                  }
                  retry_after = resp
                      .headers()
                      .get("retry-after")
                      .and_then(|v| v.to_str().ok())
                      .and_then(|v| v.parse().ok());
              }
              Err(err) => {
                  if attempt == max_retries {
                      return Err(err.into());
                  }
              }
          }

          // exponential backoff; honor Retry-After on 429
          let backoff = retry_after.unwrap_or_else(|| 2u64.pow(attempt).min(30));
          sleep(Duration::from_secs(backoff)).await;
      }
      unreachable!()
  }
  ```

  ```go Go wrap theme={null}
  package main

  import (
  	"bytes"
  	"encoding/json"
  	"fmt"
  	"io"
  	"math"
  	"math/rand"
  	"net/http"
  	"os"
  	"strconv"
  	"time"
  )

  var retriable = map[int]bool{408: true, 429: true, 500: true, 502: true, 503: true, 504: true}

  func createResponse(payload map[string]any, maxRetries int) ([]byte, error) {
  	body, _ := json.Marshal(payload)
  	client := &http.Client{Timeout: 30 * time.Second}

  	for attempt := 0; ; attempt++ {
  		req, _ := http.NewRequest("POST", "https://api.zerogpu.ai/v1/responses", bytes.NewReader(body))
  		req.Header.Set("content-type", "application/json")
  		req.Header.Set("x-api-key", os.Getenv("ZEROGPU_API_KEY"))
  		req.Header.Set("x-project-id", os.Getenv("ZEROGPU_PROJECT_ID"))

  		res, err := client.Do(req)
  		if err == nil && !retriable[res.StatusCode] {
  			defer res.Body.Close()
  			data, _ := io.ReadAll(res.Body)
  			if res.StatusCode >= 400 { // 401 / 403 / 400 - permanent
  				return nil, fmt.Errorf("zerogpu %d: %s", res.StatusCode, data)
  			}
  			return data, nil
  		}
  		if attempt >= maxRetries {
  			if err != nil {
  				return nil, err
  			}
  			return nil, fmt.Errorf("zerogpu %d after %d retries", res.StatusCode, maxRetries)
  		}

  		// exponential backoff with jitter; honor Retry-After on 429
  		delay := time.Duration(math.Min(math.Pow(2, float64(attempt)), 30))*time.Second +
  			time.Duration(rand.Intn(1000))*time.Millisecond
  		if err == nil {
  			if ra := res.Header.Get("Retry-After"); ra != "" {
  				if secs, e := strconv.Atoi(ra); e == nil {
  					delay = time.Duration(secs) * time.Second
  				}
  			}
  			res.Body.Close()
  		}
  		time.Sleep(delay)
  	}
  }
  ```

  ```ruby Ruby wrap theme={null}
  require 'net/http'
  require 'uri'
  require 'json'

  ENDPOINT = URI('https://api.zerogpu.ai/v1/responses')
  RETRIABLE = [408, 429, 500, 502, 503, 504].freeze

  def create_response(payload, max_retries: 5, timeout: 30)
    http = Net::HTTP.new(ENDPOINT.host, ENDPOINT.port)
    http.use_ssl = true
    http.open_timeout = timeout
    http.read_timeout = timeout

    request = Net::HTTP::Post.new(ENDPOINT.request_uri)
    request['content-type'] = 'application/json'
    request['x-api-key'] = ENV.fetch('ZEROGPU_API_KEY')
    request['x-project-id'] = ENV.fetch('ZEROGPU_PROJECT_ID')
    request.body = payload.to_json

    (0..max_retries).each do |attempt|
      delay =
        begin
          response = http.request(request)
          status = response.code.to_i
          unless RETRIABLE.include?(status)
            raise "ZeroGPU #{status}: #{response.body}" if status >= 400 # 401 / 403 / 400

            return JSON.parse(response.body)
          end
          raise "ZeroGPU #{status} after #{max_retries} retries" if attempt == max_retries

          # exponential backoff with jitter; honor Retry-After on 429
          response['retry-after']&.to_f || [2**attempt, 30].min + rand
        rescue Net::OpenTimeout, Net::ReadTimeout
          raise if attempt == max_retries

          [2**attempt, 30].min + rand
        end

      sleep(delay)
    end
  end
  ```
</CodeGroup>

## Use the Batch API for large jobs

For large, non-real-time workloads, the [Batch and Files API](/docs/batch) is the right tool instead of looping over the synchronous endpoint. It processes up to 50,000 requests within a 24-hour window at a discounted rate and sidesteps per-request rate limits entirely - so you don't need a retry loop at all.

| You need…                                           | Use                                                                       |
| --------------------------------------------------- | ------------------------------------------------------------------------- |
| A single immediate response                         | The synchronous endpoint (`POST /v1/responses`) with the retry loop above |
| Thousands of completions, can wait minutes-to-hours | The [Batch API](/docs/batch)                                              |
| To avoid per-second rate limits during a backfill   | The [Batch API](/docs/batch)                                              |
| Streaming responses                                 | The synchronous endpoint - streaming is **not** supported in batch mode   |

## Next steps

<Columns cols={3}>
  <Card title="Batch & Files API" icon="layer-group" href="/docs/batch" />

  <Card title="Authentication" icon="key" href="/docs/platform#authentication" />

  <Card title="API Reference" icon="terminal" href="/api-reference/responses" />
</Columns>