Skip to main content
Most production AI traffic is classification, extraction, routing, and moderation — not creative writing or multi-step reasoning. These tasks don’t need 70B parameters. They need something fast, cheap, and predictable. That’s what Nano Language Models (NLMs) are built for.

NLMs vs LLMs

LLMsNLMs
Parameters7B – 400B+Sub-1B
Runs onGPU clustersCPU, mobile, browser
OutputVariablePredictable, task-specific
CostHighLow
Latency100ms – secondsSingle-digit milliseconds
Best forOpen-ended generationClassification, extraction, routing

What NLMs handle well

  • Content classification — categorize into taxonomies at scale
  • Intent routing — map user queries to the right handler
  • Entity extraction — pull names, dates, amounts from unstructured text
  • Content moderation — flag violations in real time
  • Summarization — condense documents and conversations
  • Sentiment analysis — positive/negative/neutral at high throughput

”Why not just use a small LLM?”

Different architecture, different goals:
  1. Single-task fine-tuning — every parameter optimized for one job
  2. CPU-native — quantized and compiled for edge hardware, not adapted from GPU-first designs
  3. Deterministic output — consistent results production systems can rely on
Trade-off: NLMs can’t do open-ended generation or complex reasoning. Use LLMs for that. Use NLMs for the high-volume, well-defined tasks that make up 80%+ of production AI traffic.

Available models

ModelUse case
zlm-v1-summary-cloudText summarization
zlm-v1-iab-classify-cloudIAB content classification
Choose the model in your dashboard and use its identifier in the model field when calling the API.

API Reference

Send requests to NLMs via /v1/responses.