Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.zerogpu.ai/llms.txt

Use this file to discover all available pages before exploring further.

Claude Code is Anthropic’s agentic coding tool that runs Claude directly in your terminal, with file editing, command execution, and a plugin system that extends sessions with custom slash commands and skills. Plugins package skills that Claude can auto-invoke when a user request matches, or that users can call explicitly by name. The plugin marketplace lets teams share these extensions across projects. ZeroGPU is a serverless inference platform for nano language models, small, specialized models that run cheaply and quickly behind a single OpenAI-compatible API. It hosts a catalog of models tuned for chat, classification, structured extraction, PII detection, and IAB tagging, with geo-aware routing and a managed Batch API so teams can ship production AI features without standing up GPU infrastructure.

Overview

This guide walks through installing the zerogpu-router plugin, which exposes every ZeroGPU CLI command as a Claude Code skill. You’ll install the CLI, authenticate, load the plugin from its marketplace, and try both auto-invoked and manually-invoked skills. By the end you’ll be able to offload cheap, well-defined NLP tasks (classification, entity and PII extraction, short chat) from Claude to ZeroGPU’s edge-optimized models without leaving your terminal session.

Video walkthrough

Video walkthrough coming soon.

Quickstart

Prerequisites

  • Node.js 20 or newer.
  • Claude Code installed: npm install -g @anthropic-ai/claude-code.
  • A ZeroGPU API key and Project ID.
  • A model ID from the model catalog if you plan to call models by name.

Get your ZeroGPU API key

  1. Sign in to the ZeroGPU dashboard.
  2. Open API Keys and click Create key.
  3. Copy the key (starts with zgpu-api-) and grab your Project ID (UUID) from the project settings page.

Install the ZeroGPU CLI

npm install -g zerogpu-cli
zerogpu --version
Then authenticate so every skill call works without re-prompting:
zerogpu login
You’ll be prompted for your API key and Project ID. For CI or non-interactive setups, pass them as flags:
zerogpu login \
  --api-key zgpu-api-XXXXXXXXXXXXXXXXXX \
  --project-id 4ed3e5bb-c2ed-4d4a-8a66-2b161a27fd1a

Your first request

Start a Claude Code session by running claude in your terminal, then add the marketplace and install the plugin:
/plugin marketplace add zerogpu/zerogpu-router
/plugin install zerogpu-router@zerogpu
/reload-plugins
Confirm it’s loaded with /plugin. You should see zerogpu-router - enabled. Now try a skill:
/zerogpu-router:redact-pii "Email John Smith at [email protected] about invoice 12345."
You’ll get back: Email [PERSON] at [EMAIL] about invoice 12345.

Usage

Every inference skill auto-invokes when Claude detects a matching request, or you can call any skill explicitly with /zerogpu-router:<name> <args>. Two skills (login and status) are manual-only. This section walks through every skill in the plugin: what it does, which ZeroGPU model it routes to, the full argument surface, and the shape of the response you’ll get back.

How auto-invocation works

The plugin ships each skill with a description that Claude reads when deciding whether to invoke it for a given user message. You don’t have to remember slash command names: just describe the task. Claude picks based on intent words like “redact”, “scrub”, “classify by”, “extract”, or “tag this as”, and on the structure of the data you supply (a label list implies zero-shot; a JSON schema with category axes implies structured classification). Three common auto-invoked patterns:
Redact PII from this support ticket before I paste it into our public tracker:

"Hi team, this is Sarah Chen ([email protected], +1 415-555-0182).
Our prod database started throwing connection timeouts around 2:14 AM PT last
night. The on-call engineer Marcus Rivera restarted pgbouncer but the issue
came back. Billing should go to our CFO Priya Patel at
[email protected], billing address 1455 Market St, Suite 600,
San Francisco, CA 94103."
Claude routes to redact-pii. Names, emails, phone numbers, and street addresses come back replaced by uppercase label placeholders like [PERSON], [EMAIL], [PHONE_NUMBER], and [ADDRESS]. The raw PII never enters Claude’s context window, which is the point: you can paste the masked output into a public tracker or log without leaking customer data. Project-specific identifiers (internal hostnames, IPs, contract numbers, card last-fours) aren’t in the model’s label set; strip those yourself, or use extract-entities with custom labels.
Pull all the email addresses and phone numbers out of this:
"Reach Maria at [email protected] or 415-555-0188"
Claude routes to extract-pii, which returns structured JSON grouped by category instead of an in-line mask.
Classify this support ticket by sentiment and topic:
"Support replied quickly but the fix didn't work"
Claude routes to classify-structured and synthesizes a schema with sentiment and topic axes before calling the skill. If you ever want a specific skill instead of letting Claude choose, invoke it explicitly with the /zerogpu-router:<name> syntax shown below.

/zerogpu-router:login

Sign in to ZeroGPU and persist your credentials so every subsequent skill call works without re-prompting. Manual only; not auto-invoked by Claude. Wraps zerogpu login. Synopsis
/zerogpu-router:login [--api-key <key>] [--project-id <id>]
FlagRequiredDescription
--api-key <key>optionalAPI key. Must start with zgpu-api-. If omitted, you’ll be prompted (masked).
--project-id <id>optionalProject ID (UUID v4). If omitted, you’ll be prompted.
On success the API key and Project ID are written to your config file, and ZEROGPU_API_KEY is added to your shell profile so other tools can pick it up.

/zerogpu-router:status

Show your current ZeroGPU sign-in status and the masked API key. Manual only. Wraps zerogpu status. Exit code is 0 when signed in, 1 when not. Use this in scripts that need a hard fail before running any inference skill.

/zerogpu-router:chat

Short, single-turn chat reply for things that don’t need Claude-level reasoning or prior conversation context. Use this when you’d otherwise burn Claude tokens on a one-liner.
  • Model: LFM2.5-1.2B-Instruct
  • Wraps: zerogpu chat
  • When Claude auto-invokes: quick factual answers, one-liners, basic rephrasings where you’ve signalled “use a small model.”
Synopsis
/zerogpu-router:chat <text> [-i <instructions>]
NameRequiredDescription
textyesThe user message / prompt (quoted).
-i, --instructions <instructions>optionalSystem instructions to steer behavior.
Example
/zerogpu-router:chat "Explain WebSockets in two sentences." -i "You are a concise technical writer."
The skill returns raw assistant text (or pretty-printed JSON if the model returned a structured object).

/zerogpu-router:chat-thinking

Same shape as chat, but the model returns its reasoning trace alongside the answer.
  • Model: LFM2.5-1.2B-Thinking
  • Wraps: zerogpu chat_thinking
  • When Claude auto-invokes: short logic / math / word-problem questions where step-by-step reasoning is useful or explicitly requested.
Example
/zerogpu-router:chat-thinking "If a train leaves at 3 PM going 60 mph, when does it cover 150 miles?"

/zerogpu-router:classify-iab

Classify text against the IAB content / audience taxonomy (standard ad-tech category labels).
  • Model: zlm-v1-iab-classify-edge
  • Wraps: zerogpu classify_iab
  • When Claude auto-invokes: “what IAB category is this?”, “tag this article for ad targeting”, “give me the topic taxonomy.”
Example
/zerogpu-router:classify-iab "The Lakers signed a new point guard ahead of the playoffs."
Illustrative output
{
  "categories": [
    { "id": "IAB17-44", "name": "Basketball", "confidence": 0.97 }
  ]
}

/zerogpu-router:classify-iab-enriched

Enriched IAB classification that returns audience categories plus topics, keywords, and inferred user intent.
  • Model: zlm-v1-iab-classify-edge-enriched
  • Wraps: zerogpu classify_iab_enriched
  • When Claude auto-invokes: “give me topics, keywords, and intent”, any request asking for richer ad/audience signals than plain IAB labels.
Example
/zerogpu-router:classify-iab-enriched "Compare the Tesla Model Y and the Hyundai Ioniq 5 for a family of four."
Illustrative output
{
  "categories": [{ "id": "IAB2-1", "name": "Auto Buyers", "confidence": 0.92 }],
  "topics": ["electric vehicles", "family cars"],
  "keywords": ["Tesla Model Y", "Hyundai Ioniq 5"],
  "intent": "comparison-shopping"
}

/zerogpu-router:classify-zero-shot

Zero-shot classification against an arbitrary list of candidate labels you supply at call time. The model picks the single best fit and returns scores for all candidates.
  • Model: deberta-v3-small
  • Wraps: zerogpu classify_zero_shot
  • When Claude auto-invokes: “is this positive, negative, or neutral?”, “tag this as bug, feature, or question.”
Synopsis
/zerogpu-router:classify-zero-shot <text> (-l <label>...) | (--labels a,b,c)
NameRequiredDescription
textyesText to classify (quoted).
-l <label>one of -l / --labelsA single label. Repeatable.
--labels <a,b,c>one of -l / --labelsComma-separated label list.
Example
/zerogpu-router:classify-zero-shot "I love how fast this laptop boots up." -l positive -l negative -l neutral
Illustrative output
{ "label": "positive", "scores": { "positive": 0.94, "neutral": 0.04, "negative": 0.02 } }

/zerogpu-router:classify-structured

Schema-driven, multi-axis classification. You define each category and its allowed labels, and the model returns one chosen label per category.
  • Model: gliner2-base-v1
  • Wraps: zerogpu classify_structured
  • When Claude auto-invokes: “classify by sentiment and topic”, any request that names multiple classification dimensions with explicit label sets.
Synopsis
/zerogpu-router:classify-structured <text> -s '<json schema>'
NameRequiredDescription
textyesText to classify.
-s, --schema <json>yesJSON object mapping each category to its allowed labels.
Example
/zerogpu-router:classify-structured "Support replied quickly but the fix didn't work." \
  -s '{"sentiment":["positive","negative","neutral"],"topic":["support","billing","product"]}'
Illustrative output
{ "sentiment": "negative", "topic": "support" }

/zerogpu-router:extract-entities

Custom-label named-entity recognition. You define the entity labels; the model finds matching spans with confidence scores.
  • Model: gliner2-base-v1
  • Wraps: zerogpu extract_entities
  • When Claude auto-invokes: “extract all people, organizations, and locations from this”, “find every product mention.”
Synopsis
/zerogpu-router:extract-entities <text> (-l <label>... | --labels a,b,c) [-t <0..1>]
NameRequiredDefaultDescription
textyes-Source text.
-l <label> / --labels <a,b,c>yes (one)-Entity labels to extract.
-t, --threshold <number>optional0.3Minimum confidence in [0, 1].
Example
/zerogpu-router:extract-entities "Apple CEO Tim Cook met with Sundar Pichai in Cupertino on Monday." \
  --labels person,organization,location -t 0.4
Illustrative output
[
  { "label": "organization", "text": "Apple",         "score": 0.98 },
  { "label": "person",       "text": "Tim Cook",      "score": 0.97 },
  { "label": "person",       "text": "Sundar Pichai", "score": 0.96 },
  { "label": "location",     "text": "Cupertino",     "score": 0.91 }
]

/zerogpu-router:extract-pii

Extract personally identifiable information entities, grouped by category, without modifying the source text. Use when you need structured data about PII (for redaction policies, audits, or downstream tooling) rather than a masked version.
  • Model: gliner-multi-pii-v1
  • Wraps: zerogpu extract_pii
  • When Claude auto-invokes: “find all PII”, “what personal info is in this?”, “list emails/phones/names.”
Synopsis
/zerogpu-router:extract-pii <text> [-t <threshold>] [(-c | --categories) <list>]
NameRequiredDefaultDescription
textyes-Source text.
-t, --threshold <number>optional0.5Minimum confidence.
-c, --categories <list>optionalidentity,contactComma-separated. Other values: financial, medical, credentials.
Example
/zerogpu-router:extract-pii "Contact Jane Doe at [email protected] or +1 (415) 555-1212." -t 0.6 -c identity,contact,financial
Illustrative output
[
  { "category": "identity", "label": "person", "text": "Jane Doe",          "score": 0.96 },
  { "category": "contact",  "label": "email",  "text": "[email protected]",  "score": 0.99 },
  { "category": "contact",  "label": "phone",  "text": "+1 (415) 555-1212", "score": 0.95 }
]
If you want to mask PII inline rather than extract it, use /zerogpu-router:redact-pii.

/zerogpu-router:redact-pii

Detect PII and replace each span in-line with a [LABEL] placeholder. Use this before sharing or logging sensitive text, or before forwarding user input to another LLM that you don’t want to expose raw PII to.
  • Model: gliner-multi-pii-v1 (with mask: "label")
  • Wraps: zerogpu redact_pii
  • When Claude auto-invokes: “redact”, “scrub”, “mask”, “anonymize”, or “sanitize this for sharing.”
Example
/zerogpu-router:redact-pii "Email John Smith at [email protected] about invoice 12345."
Output
Email [PERSON] at [EMAIL] about invoice 12345.
Note that 12345 is not masked: only spans the model recognizes as PII are replaced. Domain-specific identifiers (account numbers, internal ticket IDs) should be handled by your own redaction layer or by extract-entities with custom labels.

/zerogpu-router:extract-json

Pull specific named fields out of free text into a structured JSON object, defined by a per-field schema. Each field is declared as name::type::description.
  • Model: gliner2-base-v1
  • Wraps: zerogpu extract_json
  • When Claude auto-invokes: “extract the contact info as JSON”, “parse this invoice”, “pull these fields out.”
Synopsis
/zerogpu-router:extract-json <text> -s '<json schema>'
Example
/zerogpu-router:extract-json "Reach Maria Lopez at [email protected] or 415-555-0188." \
  -s '{"contact":["name::str::Full name","email::str::Email address","phone::str::Phone number"]}'
Illustrative output
{
  "contact": {
    "name": "Maria Lopez",
    "email": "[email protected]",
    "phone": "415-555-0188"
  }
}

Skills reference table

Every skill at a glance.
SkillPurposeExample
/zerogpu-router:loginSign in and persist API key + Project ID (manual only)/zerogpu-router:login
/zerogpu-router:statusShow current sign-in status (manual only)/zerogpu-router:status
/zerogpu-router:chat <text>Short chat reply via LFM2.5-1.2B-Instruct/zerogpu-router:chat "Explain WebSockets in two sentences."
/zerogpu-router:chat-thinking <text>Chat with the Thinking variant (returns reasoning)/zerogpu-router:chat-thinking "If a train leaves at 3 PM going 60 mph..."
/zerogpu-router:classify-iab <text>IAB taxonomy classification/zerogpu-router:classify-iab "The Lakers signed a new point guard."
/zerogpu-router:classify-iab-enriched <text>IAB + topics/keywords/intent/zerogpu-router:classify-iab-enriched "Compare the Tesla Model Y..."
/zerogpu-router:classify-zero-shot <text> -l ...Zero-shot against custom labels/zerogpu-router:classify-zero-shot "fast laptop" -l positive -l negative
/zerogpu-router:classify-structured <text> -s '...'Schema-based multi-axis classification/zerogpu-router:classify-structured "ticket text" -s '{"sentiment":[...]}'
/zerogpu-router:extract-entities <text> -l ...Custom-label NER/zerogpu-router:extract-entities "Tim Cook met Sundar Pichai..." -l person -l location
/zerogpu-router:extract-pii <text>Extract PII entities (returns JSON)/zerogpu-router:extract-pii "Contact Jane at [email protected]"
/zerogpu-router:redact-pii <text>Mask PII in-line with [LABEL] placeholders/zerogpu-router:redact-pii "Email John at [email protected]"
/zerogpu-router:extract-json <text> -s '...'Schema-driven JSON extraction/zerogpu-router:extract-json "..." -s '{"contact":["name::str::Full name"]}'
For full flag reference on any command, run zerogpu <command> --help outside the Claude Code session.

Patterns and recipes

Sanitize before Claude sees raw input. Pipe untrusted text through redact-pii first when you don’t want personal data captured in Claude’s transcript or sent to downstream LLMs. Combine with extract-pii if you also need an audit log of what was masked. Cheap router in front of Claude. Use classify-zero-shot or classify-structured to triage an incoming message (bug / feature / question, urgent / normal, in-scope / out-of-scope) and only escalate the hard cases to Claude itself. The classifier call costs orders of magnitude less than a Claude turn. Structured extraction over free-form parsing. When you have semi-structured text (signatures, invoices, contact blocks), prefer extract-json over asking Claude to “parse this into JSON.” It’s deterministic on the schema, faster, and cheaper. Keep field descriptions short and specific - the description is what the model uses to find the span. Confidence thresholds. For NER and PII extraction, the default thresholds (0.3 and 0.5 respectively) are tuned for recall. Raise -t to 0.6 or higher when you need precision (e.g. compliance-grade redaction lists); lower it when you’d rather over-extract and filter downstream.

Troubleshooting

zerogpu: command not found - the CLI isn’t installed or isn’t on your PATH. Run npm install -g zerogpu-cli and restart your shell. If you use a Node version manager (nvm, fnm, volta), make sure the shell that launched Claude Code has the same Node version active. Skill returns “You’re not signed in yet.” - no credentials on disk. Run /zerogpu-router:login inside Claude Code, or zerogpu login in your terminal. Check /zerogpu-router:status to confirm. /zerogpu-router:* skills don’t appear in /help - the plugin isn’t enabled. Run /plugin to view installed plugins, enable zerogpu-router, then /reload-plugins. If it’s not listed at all, re-run /plugin marketplace add zerogpu/zerogpu-router followed by /plugin install zerogpu-router@zerogpu. Request failed with status 401 - your API key is missing, revoked, or mistyped. Rotate the key in the dashboard and re-run /zerogpu-router:login. Keys must start with zgpu-api-. Request failed with status 403 - the key is valid but doesn’t have access to the project, or the project doesn’t have access to the requested model. Confirm your Project ID matches the project that owns the key. Request failed with status 429 - you’re being rate-limited. Back off and retry with exponential delay, or switch heavy workloads to the Batch API, which has separate quotas tuned for bulk jobs. Wrong skill auto-invoked for a request. Claude picks based on phrasing. If you want a specific skill, call it explicitly with /zerogpu-router:<name>. If a skill keeps getting picked when you don’t want it, rephrase to remove the trigger words (“redact”, “classify”, “extract”) or invoke the right one by name. Schema parsing errors on classify-structured / extract-json. The -s flag expects a single-quoted JSON string. On Windows PowerShell, escape inner double quotes or use a here-string. Run the CLI directly (zerogpu classify_structured ... -s '...') to isolate quoting issues from Claude Code. Empty or low-confidence results. Lower -t to surface more candidates, or check that the label set you supplied matches the language of the source text (the underlying models are English-tuned for most label sets). For very short inputs (one or two words), expect lower confidence across the board. /reload-plugins doesn’t pick up a new CLI version. Plugin reload only touches Claude Code state; it doesn’t reinstall the CLI. Run npm install -g zerogpu-cli@latest in your terminal, then zerogpu --version to confirm.

Conclusion

The zerogpu-router plugin turns ZeroGPU’s nano language models into first-class Claude Code skills, so Claude can hand off classification, extraction, and short chat tasks to a cheaper, faster model without you leaving the session. It’s a fast way to keep raw PII out of Claude’s context, cut token spend on well-defined NLP work, and prototype routing patterns you can later promote to production.

Model Catalog

Browse every model the plugin can route to.

API Reference

Explore the full OpenAI-compatible API surface.

Cookbook

Worked examples for classification, extraction, and batch jobs.

Join Discord

Ask questions and share what you’re building.