What is ZeroGPU?
What is ZeroGPU?
An API for AI model inference. You send requests, get responses. No GPUs to manage — ZeroGPU handles model hosting, scaling, and routing.
How does it reduce costs?
How does it reduce costs?
Inference runs on distributed edge devices using Nano Language Models (sub-1B parameters) instead of GPU clusters. You pay per request, not for reserved capacity.
What workloads does it support?
What workloads does it support?
Summarization, classification, entity extraction, content moderation, intent routing, sentiment analysis — the high-volume tasks that make up most production AI traffic.
What models are available?
What models are available?
Two models: zlm-v1-summary-cloud (summarization) and zlm-v1-iab-classify-cloud (IAB content classification). Select one in your dashboard and pass it in the
model field.How do I integrate it?
How do I integrate it?
One endpoint:
POST https://api.zerogpu.ai/v1/responses. Add your API key and project ID as headers. SDKs available for Python, JavaScript, Rust, Go, Ruby. See the Quickstart.What latency should I expect?
What latency should I expect?
Depends on the model and workload. Monitor real-time latency in Usage Analytics.
Is it production-ready?
Is it production-ready?
Yes. Project isolation, API key management, request logging, usage analytics, and automatic cloud fallback for availability.
I lost my API key.
I lost my API key.
Keys are shown once at creation. Revoke the lost key in API Keys and create a new one.
Can I separate dev and production?
Can I separate dev and production?
Yes. Create separate projects in your organization. Each gets its own keys, logs, and analytics. See Organizations & Projects.
How do I monitor usage?
How do I monitor usage?
Usage Analytics for trends (tokens, requests, latency). Logs for individual request detail.

