Skip to main content
The router decides which edge device handles each request. It evaluates four signals in milliseconds:
SignalWhat it optimizes
Geographic proximityLowest network latency
Device capabilityEnough compute for the requested model
Current loadAvoids overloaded nodes
Model availabilityRoutes to nodes with the NLM already cached

Request flow

1

Request arrives

API gateway extracts the model identifier, payload size, and origin IP.
2

Node selected

Router queries the network topology. Best node that satisfies all constraints wins.
3

Inference runs

Request forwarded to edge device. NLM processes input, returns result.
4

Fallback if needed

No edge node responds in time? Request goes to cloud infrastructure. Same response format — your app doesn’t know the difference.

Cloud fallback

Availability guarantee. If the edge network can’t serve a request — capacity, model, or device constraints — cloud-hosted replicas handle it transparently. Trade-off: Cloud fallback may have slightly higher latency than edge, but it ensures 100% availability. Your integration code doesn’t change either way.