curl request to the /v1/responses endpoint, sends it to the LFM2.5-1.2B-Thinking model, and reports the routing decision the model returns. Claude does not decide the team or priority itself; the label comes from the API. The result is a triage step that runs on each incoming ticket and produces a consistent team and priority.
For the full reference, see the LFM2.5-1.2B-Thinking model card.
In this guide, you’ll explore:
- Claude with the ZeroGPU Skill: Claude follows the
SKILL.mdcontract for ZeroGPU. It reads the skill, requires an API key before doing anything, builds the exact/v1/responsesrequest for the right model, and reports the response. It does not fabricate or simulate model outputs. - ZeroGPU: An OpenAI-compatible inference provider that serves small, purpose-built language models over a serverless, auto-scaling API. It targets high-volume, narrow tasks and does not require you to provision or manage GPU infrastructure.
- LFM2.5-1.2B-Thinking: Liquid AI’s 1.2B-parameter reasoning model. It reasons step by step internally before producing a final answer and is small enough to run on edge hardware.
🎥 Watch the Video Guide
Video walkthrough coming soon.🎫 The Problem: Triaging Tickets at Volume
Support inboxes receive more tickets than agents can read in order. Each ticket needs two decisions before it can be handled: which team owns it (Billing, Technical, Account, General) and how urgent it is (Low, Medium, High). Done by hand, this first pass is slow and varies between agents, and a high-impact issue can sit in a general queue until someone picks it up. The goal is a routing step that reads each ticket, returns a team and a priority, and runs on every ticket without provisioning GPU infrastructure.⚡ Why ZeroGPU
Triage is a high-volume, narrow classification task. TheLFM2.5-1.2B-Thinking model is a 1.2B-parameter reasoning model that runs on edge hardware, which keeps per-ticket cost and latency low and removes the need to manage your own servers.
It is a reasoning model, so it works through the ticket internally before committing to a decision, which suits cases where the team and priority depend on context (impact, urgency, number of accounts affected). ZeroGPU serves it over an OpenAI-compatible, serverless API, so the same /v1/responses call handles one ticket or many.
🔑 Setting Up Your API Key
The ZeroGPU Skill requires an API key before it builds any request. If the key is not available, Claude asks you for it and waits. Per the Skill contract, it does not proceed, does not build acurl request, and does not generate or describe any model output until you supply the key.
You can go to the ZeroGPU dashboard to get an API key. The key starts with zgpu-api-.
Export the key so Claude can reference it in the request:
x-api-key header, and all calls go to the /v1/responses endpoint at https://api.zerogpu.ai/v1.
🧠 The Workflow: How Claude Routes a Ticket
With the Skill loaded, routing a ticket follows a fixed sequence:- Read the Skill. Claude loads the ZeroGPU
SKILL.mdand selectsLFM2.5-1.2B-Thinkingfor the triage task. - Check authentication. If
ZEROGPU_API_KEYis not set, Claude prompts you for the key and waits. It does not proceed or produce any model output until the key is provided. - Build the request. Claude assembles a
curlcall to/v1/responses: the ticket text goes ininput, and the routing rubric goes ininstructions. - Send it and report the result. Claude runs the request and reports the routing decision returned by the API. It does not substitute its own label.
Example prompt
You give Claude a ticket and ask it to route the ticket through ZeroGPU:Exact curl request
Claude builds and runs this request. Theinstructions field encodes the routing rubric so the model returns a single, parseable decision line:
API response
The Thinking model reasons internally and ends its output with the routing line. The integration keeps only that finalTEAM | PRIORITY line and discards the rest, so no intermediate reasoning is surfaced or stored:
✅ Final Outcome
Claude reads the final line,Technical | High, and routes the ticket to the engineering queue at high priority. A billing question would return Billing | Low or Billing | Medium and go to a different queue.
Wired into your inbox, the same /v1/responses call runs on every incoming ticket, so each one is assigned a team and a priority without a human first pass and without managing GPU infrastructure.
🌟 Highlights
This guide has walked you through using Claude and the ZeroGPU Skill to triage support tickets with theLFM2.5-1.2B-Thinking model, where Claude builds the exact /v1/responses call and the API returns the team-and-priority decision. You can adapt this example to other scenarios that require classifying short text into fixed categories.
Key tools utilized in this guide include:
- Claude with the ZeroGPU Skill: Claude follows the
SKILL.mdcontract for ZeroGPU. It reads the skill, requires an API key before doing anything, builds the exact/v1/responsesrequest for the right model, and reports the response. It does not fabricate or simulate model outputs. - ZeroGPU: An OpenAI-compatible inference provider that serves small, purpose-built language models over a serverless, auto-scaling API. It targets high-volume, narrow tasks and does not require you to provision or manage GPU infrastructure.
- LFM2.5-1.2B-Thinking: Liquid AI’s 1.2B-parameter reasoning model. It reasons step by step internally before producing a final answer and is small enough to run on edge hardware.

