> ## Documentation Index > Fetch the complete documentation index at: https://docs.zerogpu.ai/llms.txt > Use this file to discover all available pages before exploring further. # How ZeroGPU works > ZeroGPU turns idle compute into one programmable inference layer - specialized small models, an edge-powered network, and routing handled for you. ZeroGPU turns idle compute into one programmable inference layer. You send a request; we run it on a specialized small or nano model, on the cheapest compute that can serve it well. ## Specialized small and nano models ZeroGPU runs purpose-built ZLMs (ZeroGPU Language Models) for high-volume tasks like IAB classification and signal extraction, alongside a catalog of open small and nano models - DeBERTa, GLiNER, LFM2.5, Llama 3.1 8B. Small enough to run at the edge, good enough for production. See the [Model Catalog](/docs/model-catalog). ## An edge-powered network Requests run across a hybrid of: Phones, gaming PCs. Mid-sized models, higher load. Consistent performance and burst capacity. ## Routing For each request, ZeroGPU picks the right model and the right compute by capability, availability, and load - and routes geographically to cut latency. You call one endpoint; the orchestration is handled for you. This is network-side routing (which model, which compute). It's distinct from the ZeroGPU Router in the skills/plugins, which decides - on the client side - which steps of an agent run to offload to ZeroGPU at all. See [Integrations → Skills + CLI (Claude Code)](/integrations/claude-code-plugin).