Geo-Aware Routing

Request flow
Cloud fallback

The router decides which edge device handles each request. It evaluates four signals in milliseconds:

Signal	What it optimizes
Geographic proximity	Lowest network latency
Device capability	Enough compute for the requested model
Current load	Avoids overloaded nodes
Model availability	Routes to nodes with the NLM already cached

Request flow

Request arrives

API gateway extracts the model identifier, payload size, and origin IP.

Node selected

Router queries the network topology. Best node that satisfies all constraints wins.

Inference runs

Request forwarded to edge device. NLM processes input, returns result.

Fallback if needed

No edge node responds in time? Request goes to cloud infrastructure. Same response format — your app doesn’t know the difference.

Cloud fallback

Availability guarantee. If the edge network can’t serve a request — capacity, model, or device constraints — cloud-hosted replicas handle it transparently. Trade-off: Cloud fallback may have slightly higher latency than edge, but it ensures 100% availability. Your integration code doesn’t change either way.

Distributed Inference

The full architecture behind edge compute.

API Reference

Endpoint spec for /v1/responses.

Distributed Inference FAQ

Getting Started

Models

Platform

Core Concepts

Resources

Geo-Aware Routing

Request flow

Cloud fallback

Distributed Inference

API Reference

Getting Started

Models

Platform

Core Concepts

Resources

Documentation Index

​Request flow

​Cloud fallback

Distributed Inference

API Reference

Request flow

Cloud fallback