The router decides which edge device handles each request. It evaluates four signals in milliseconds:Documentation Index
Fetch the complete documentation index at: https://docs.zerogpu.ai/llms.txt
Use this file to discover all available pages before exploring further.
| Signal | What it optimizes |
|---|---|
| Geographic proximity | Lowest network latency |
| Device capability | Enough compute for the requested model |
| Current load | Avoids overloaded nodes |
| Model availability | Routes to nodes with the NLM already cached |
Request flow
Cloud fallback
Availability guarantee. If the edge network can’t serve a request — capacity, model, or device constraints — cloud-hosted replicas handle it transparently. Trade-off: Cloud fallback may have slightly higher latency than edge, but it ensures 100% availability. Your integration code doesn’t change either way.Distributed Inference
The full architecture behind edge compute.
API Reference
Endpoint spec for
/v1/responses.
