Use this file to discover all available pages before exploring further.
Claude Code is Anthropic’s agentic coding tool that runs Claude directly in your terminal, with file editing, command execution, and a plugin system that extends sessions with custom slash commands and skills. Plugins package skills that Claude can auto-invoke when a user request matches, or that users can call explicitly by name. The plugin marketplace lets teams share these extensions across projects.ZeroGPU is a serverless inference platform for nano language models, small, specialized models that run cheaply and quickly behind a single OpenAI-compatible API. It hosts a catalog of models tuned for chat, classification, structured extraction, PII detection, and IAB tagging, with geo-aware routing and a managed Batch API so teams can ship production AI features without standing up GPU infrastructure.
This guide walks through installing the zerogpu-router plugin, which exposes every ZeroGPU CLI command as a Claude Code skill. You’ll install the CLI, authenticate, load the plugin from its marketplace, and try both auto-invoked and manually-invoked skills. By the end you’ll be able to offload cheap, well-defined NLP tasks (classification, entity and PII extraction, short chat) from Claude to ZeroGPU’s edge-optimized models without leaving your terminal session.
Every inference skill auto-invokes when Claude detects a matching request, or you can call any skill explicitly with /zerogpu-router:<name> <args>. Two skills (login and status) are manual-only. This section walks through every skill in the plugin: what it does, which ZeroGPU model it routes to, the full argument surface, and the shape of the response you’ll get back.
The plugin ships each skill with a description that Claude reads when deciding whether to invoke it for a given user message. You don’t have to remember slash command names: just describe the task. Claude picks based on intent words like “redact”, “scrub”, “classify by”, “extract”, or “tag this as”, and on the structure of the data you supply (a label list implies zero-shot; a JSON schema with category axes implies structured classification).Three common auto-invoked patterns:
Redact PII from this support ticket before I paste it into our public tracker:"Hi team, this is Sarah Chen ([email protected], +1 415-555-0182).Our prod database started throwing connection timeouts around 2:14 AM PT lastnight. The on-call engineer Marcus Rivera restarted pgbouncer but the issuecame back. Billing should go to our CFO Priya Patel at[email protected], billing address 1455 Market St, Suite 600,San Francisco, CA 94103."
Claude routes to redact-pii. Names, emails, phone numbers, and street addresses come back replaced by uppercase label placeholders like [PERSON], [EMAIL], [PHONE_NUMBER], and [ADDRESS]. The raw PII never enters Claude’s context window, which is the point: you can paste the masked output into a public tracker or log without leaking customer data. Project-specific identifiers (internal hostnames, IPs, contract numbers, card last-fours) aren’t in the model’s label set; strip those yourself, or use extract-entities with custom labels.
Pull all the email addresses and phone numbers out of this:"Reach Maria at [email protected] or 415-555-0188"
Claude routes to extract-pii, which returns structured JSON grouped by category instead of an in-line mask.
Classify this support ticket by sentiment and topic:"Support replied quickly but the fix didn't work"
Claude routes to classify-structured and synthesizes a schema with sentiment and topic axes before calling the skill.If you ever want a specific skill instead of letting Claude choose, invoke it explicitly with the /zerogpu-router:<name> syntax shown below.
Sign in to ZeroGPU and persist your credentials so every subsequent skill call works without re-prompting. Manual only; not auto-invoked by Claude.Wrapszerogpu login.Synopsis
API key. Must start with zgpu-api-. If omitted, you’ll be prompted (masked).
--project-id <id>
optional
Project ID (UUID v4). If omitted, you’ll be prompted.
On success the API key and Project ID are written to your config file, and ZEROGPU_API_KEY is added to your shell profile so other tools can pick it up.
Show your current ZeroGPU sign-in status and the masked API key. Manual only.Wrapszerogpu status. Exit code is 0 when signed in, 1 when not. Use this in scripts that need a hard fail before running any inference skill.
Short, single-turn chat reply for things that don’t need Claude-level reasoning or prior conversation context. Use this when you’d otherwise burn Claude tokens on a one-liner.
Model:LFM2.5-1.2B-Instruct
Wraps:zerogpu chat
When Claude auto-invokes: quick factual answers, one-liners, basic rephrasings where you’ve signalled “use a small model.”
Synopsis
/zerogpu-router:chat <text> [-i <instructions>]
Name
Required
Description
text
yes
The user message / prompt (quoted).
-i, --instructions <instructions>
optional
System instructions to steer behavior.
Example
/zerogpu-router:chat "Explain WebSockets in two sentences." -i "You are a concise technical writer."
The skill returns raw assistant text (or pretty-printed JSON if the model returned a structured object).
Zero-shot classification against an arbitrary list of candidate labels you supply at call time. The model picks the single best fit and returns scores for all candidates.
Model:deberta-v3-small
Wraps:zerogpu classify_zero_shot
When Claude auto-invokes: “is this positive, negative, or neutral?”, “tag this as bug, feature, or question.”
Extract personally identifiable information entities, grouped by category, without modifying the source text. Use when you need structured data about PII (for redaction policies, audits, or downstream tooling) rather than a masked version.
Model:gliner-multi-pii-v1
Wraps:zerogpu extract_pii
When Claude auto-invokes: “find all PII”, “what personal info is in this?”, “list emails/phones/names.”
Detect PII and replace each span in-line with a [LABEL] placeholder. Use this before sharing or logging sensitive text, or before forwarding user input to another LLM that you don’t want to expose raw PII to.
Model:gliner-multi-pii-v1 (with mask: "label")
Wraps:zerogpu redact_pii
When Claude auto-invokes: “redact”, “scrub”, “mask”, “anonymize”, or “sanitize this for sharing.”
Example
/zerogpu-router:redact-pii "Email John Smith at [email protected] about invoice 12345."
Output
Email [PERSON] at [EMAIL] about invoice 12345.
Note that 12345 is not masked: only spans the model recognizes as PII are replaced. Domain-specific identifiers (account numbers, internal ticket IDs) should be handled by your own redaction layer or by extract-entities with custom labels.
Pull specific named fields out of free text into a structured JSON object, defined by a per-field schema. Each field is declared as name::type::description.
Model:gliner2-base-v1
Wraps:zerogpu extract_json
When Claude auto-invokes: “extract the contact info as JSON”, “parse this invoice”, “pull these fields out.”
/zerogpu-router:extract-json "Reach Maria Lopez at [email protected] or 415-555-0188." \ -s '{"contact":["name::str::Full name","email::str::Email address","phone::str::Phone number"]}'
Sanitize before Claude sees raw input. Pipe untrusted text through redact-pii first when you don’t want personal data captured in Claude’s transcript or sent to downstream LLMs. Combine with extract-pii if you also need an audit log of what was masked.Cheap router in front of Claude. Use classify-zero-shot or classify-structured to triage an incoming message (bug / feature / question, urgent / normal, in-scope / out-of-scope) and only escalate the hard cases to Claude itself. The classifier call costs orders of magnitude less than a Claude turn.Structured extraction over free-form parsing. When you have semi-structured text (signatures, invoices, contact blocks), prefer extract-json over asking Claude to “parse this into JSON.” It’s deterministic on the schema, faster, and cheaper. Keep field descriptions short and specific - the description is what the model uses to find the span.Confidence thresholds. For NER and PII extraction, the default thresholds (0.3 and 0.5 respectively) are tuned for recall. Raise -t to 0.6 or higher when you need precision (e.g. compliance-grade redaction lists); lower it when you’d rather over-extract and filter downstream.
zerogpu: command not found - the CLI isn’t installed or isn’t on your PATH. Run npm install -g zerogpu-cli and restart your shell. If you use a Node version manager (nvm, fnm, volta), make sure the shell that launched Claude Code has the same Node version active.Skill returns “You’re not signed in yet.” - no credentials on disk. Run /zerogpu-router:login inside Claude Code, or zerogpu login in your terminal. Check /zerogpu-router:status to confirm./zerogpu-router:* skills don’t appear in /help - the plugin isn’t enabled. Run /plugin to view installed plugins, enable zerogpu-router, then /reload-plugins. If it’s not listed at all, re-run /plugin marketplace add zerogpu/zerogpu-router followed by /plugin install zerogpu-router@zerogpu.Request failed with status 401 - your API key is missing, revoked, or mistyped. Rotate the key in the dashboard and re-run /zerogpu-router:login. Keys must start with zgpu-api-.Request failed with status 403 - the key is valid but doesn’t have access to the project, or the project doesn’t have access to the requested model. Confirm your Project ID matches the project that owns the key.Request failed with status 429 - you’re being rate-limited. Back off and retry with exponential delay, or switch heavy workloads to the Batch API, which has separate quotas tuned for bulk jobs.Wrong skill auto-invoked for a request. Claude picks based on phrasing. If you want a specific skill, call it explicitly with /zerogpu-router:<name>. If a skill keeps getting picked when you don’t want it, rephrase to remove the trigger words (“redact”, “classify”, “extract”) or invoke the right one by name.Schema parsing errors on classify-structured / extract-json. The -s flag expects a single-quoted JSON string. On Windows PowerShell, escape inner double quotes or use a here-string. Run the CLI directly (zerogpu classify_structured ... -s '...') to isolate quoting issues from Claude Code.Empty or low-confidence results. Lower -t to surface more candidates, or check that the label set you supplied matches the language of the source text (the underlying models are English-tuned for most label sets). For very short inputs (one or two words), expect lower confidence across the board./reload-plugins doesn’t pick up a new CLI version. Plugin reload only touches Claude Code state; it doesn’t reinstall the CLI. Run npm install -g zerogpu-cli@latest in your terminal, then zerogpu --version to confirm.
The zerogpu-router plugin turns ZeroGPU’s nano language models into first-class Claude Code skills, so Claude can hand off classification, extraction, and short chat tasks to a cheaper, faster model without you leaving the session. It’s a fast way to keep raw PII out of Claude’s context, cut token spend on well-defined NLP work, and prototype routing patterns you can later promote to production.
Model Catalog
Browse every model the plugin can route to.
API Reference
Explore the full OpenAI-compatible API surface.
Cookbook
Worked examples for classification, extraction, and batch jobs.