Most production AI traffic is classification, extraction, routing, and moderation — not creative writing or multi-step reasoning. These tasks don’t need 70B parameters. They need something fast, cheap, and predictable. That’s what Nano Language Models (NLMs) are built for.Documentation Index
Fetch the complete documentation index at: https://docs.zerogpu.ai/llms.txt
Use this file to discover all available pages before exploring further.
NLMs vs LLMs
| LLMs | NLMs | |
|---|---|---|
| Parameters | 7B – 400B+ | Sub-1B |
| Runs on | GPU clusters | CPU, mobile, browser |
| Output | Variable | Predictable, task-specific |
| Cost | High | Low |
| Latency | 100ms – seconds | Single-digit milliseconds |
| Best for | Open-ended generation | Classification, extraction, routing |
What NLMs handle well
- Content classification — categorize into taxonomies at scale
- Intent routing — map user queries to the right handler
- Entity extraction — pull names, dates, amounts from unstructured text
- Content moderation — flag violations in real time
- Summarization — condense documents and conversations
- Sentiment analysis — positive/negative/neutral at high throughput
”Why not just use a small LLM?”
Different architecture, different goals:- Single-task fine-tuning — every parameter optimized for one job
- CPU-native — quantized and compiled for edge hardware, not adapted from GPU-first designs
- Deterministic output — consistent results production systems can rely on
Available models
| Model | Use case |
|---|---|
zlm-v1-summary-cloud | Text summarization |
zlm-v1-iab-classify-cloud | IAB content classification |
model field when calling the API.
API Reference
Send requests to NLMs via
/v1/responses.
