Specialized small and nano models
ZeroGPU runs purpose-built ZLMs (ZeroGPU Language Models) for high-volume tasks like IAB classification and signal extraction, alongside a catalog of open small and nano models - DeBERTa, GLiNER, LFM2.5, Llama 3.1 8B. Small enough to run at the edge, good enough for production. See the Model Catalog.An edge-powered network
Requests run across a hybrid of:Edge devices
Phones, gaming PCs.
Optimized edge servers
Mid-sized models, higher load.
Cloud fallback
Consistent performance and burst capacity.
Routing
For each request, ZeroGPU picks the right model and the right compute by capability, availability, and load - and routes geographically to cut latency. You call one endpoint; the orchestration is handled for you.This is network-side routing (which model, which compute). It’s distinct from the ZeroGPU Router in the skills/plugins, which decides - on the client side - which steps of an agent run to offload to ZeroGPU at all. See Integrations → Skills + CLI (Claude Code).

