Skip to main content
ZeroGPU is the compute efficiency layer for AI inference. It runs repeatable, high-volume AI tasks on specialized small and nano language models across an edge-powered network - so the same workloads run faster and cheaper than on centralized GPUs. Send a request to one OpenAI-compatible endpoint; ZeroGPU picks the right small or nano model and runs it on the right compute. Frontier models for reasoning. ZeroGPU for repeatable execution - classification, extraction, moderation, summarization, routing - at the scale AI demands.

Make your first call

Find your API key and project ID in the ZeroGPU dashboard.
curl --location 'https://api.zerogpu.ai/v1/responses' \
  --header 'content-type: application/json' \
  --header 'x-api-key: YOUR_API_KEY' \
  --header 'x-project-id: YOUR_PROJECT_ID' \
  --data '{
  "input": "Technology has quietly reshaped the rhythm of everyday life, weaving itself into routines so seamlessly that it often goes unnoticed. From the moment a smartphone alarm wakes someone in the morning to the final glance at a glowing screen before sleep, digital systems guide communication, navigation, work, and entertainment. This transformation did not happen overnight. It emerged through decades of incremental innovation, each new tool building upon the last, until the modern world became deeply interconnected.One of the most significant changes has been the speed at which information travels.",
  "model": "zlm-v1-iab-classify-edge"
}'
Already using the OpenAI SDK? Point it at ZeroGPU:
from openai import OpenAI

client = OpenAI(
    base_url="https://api.zerogpu.ai/v1",
    api_key="unused",  # ZeroGPU authenticates via the headers below
    default_headers={
        "x-api-key": "YOUR_API_KEY",
        "x-project-id": "YOUR_PROJECT_ID",
    },
)

resp = client.responses.create(
    model="zlm-v1-iab-classify-edge",
    input="Technology has quietly reshaped the rhythm of everyday life, weaving itself into routines so seamlessly that it often goes unnoticed. From the moment a smartphone alarm wakes someone in the morning to the final glance at a glowing screen before sleep, digital systems guide communication, navigation, work, and entertainment. This transformation did not happen overnight. It emerged through decades of incremental innovation, each new tool building upon the last, until the modern world became deeply interconnected.One of the most significant changes has been the speed at which information travels.",
)
🎉 Either call returns the same classification result:
{
  "audience": [
    { "name": "Technology & Computing", "score": 0.8194953318627214 },
    { "name": "Consumer Electronics", "score": 0.7730970665427267 },
    { "name": "IT and Internet Support", "score": 0.6476231908025892 }
  ],
  "content": {
    "iab_1_0": [
      { "name": "Technology & Computing", "score": 0.8130967783784443 },
      { "name": "Automotive", "score": 0.6588144281246334 }
    ],
    "iab_2_2": [
      { "name": "Technology & Computing", "score": 0.8130967783784443 },
      { "name": "Wearable Technology", "score": 0.7080836335415118 },
      { "name": "Auto Infotainment Technologies", "score": 0.6588144281246334 }
    ]
  }
}

What you can run

Text classification

Text generation

PII detection

Summarization

What you get

Lower cost and latency

Specialized small and nano models on distributed compute. ~10x faster and 50%+ cheaper on production tasks.

One OpenAI-compatible API

POST /v1/responses; integrate with minimal code changes.

Built for production

Token usage, latency, and volume per request, with project isolation for dev, staging, and production.

How it works

Workload analysis → Model selection → Edge orchestration. ZeroGPU classifies each task, picks the fastest viable model, and routes it to the right compute. See How ZeroGPU works for the full picture.

Go deeper

Quickstart

How ZeroGPU works

Model Catalog

API Reference