Skip to main content
An API for AI model inference. You send requests, get responses. No GPUs to manage — ZeroGPU handles model hosting, scaling, and routing.
Inference runs on distributed edge devices using Nano Language Models (sub-1B parameters) instead of GPU clusters. You pay per request, not for reserved capacity.
Summarization, classification, entity extraction, content moderation, intent routing, sentiment analysis — the high-volume tasks that make up most production AI traffic.
Two models: zlm-v1-summary-cloud (summarization) and zlm-v1-iab-classify-cloud (IAB content classification). Select one in your dashboard and pass it in the model field.
One endpoint: POST https://api.zerogpu.ai/v1/responses. Add your API key and project ID as headers. SDKs available for Python, JavaScript, Rust, Go, Ruby. See the Quickstart.
Depends on the model and workload. Monitor real-time latency in Usage Analytics.
Yes. Project isolation, API key management, request logging, usage analytics, and automatic cloud fallback for availability.
Keys are shown once at creation. Revoke the lost key in API Keys and create a new one.
Yes. Create separate projects in your organization. Each gets its own keys, logs, and analytics. See Organizations & Projects.
Usage Analytics for trends (tokens, requests, latency). Logs for individual request detail.