Chat completions
OpenAI-style chat inference for models that use this route.
messages array) instead of the Responses input field. The dashboard and model catalog indicate which route applies. Each model page has its own playground with the right examples for that model.
Install the official SDK from npm or PyPI (pip install zerogpu-api). Source: zerogpu/SDK.
Response JSON shape depends on the model; handle errors the same way as API error codes.Authorizations
Your ZeroGPU API key. Create one in the dashboard under API keys. Send it on every request.
The UUID of the project the request is billed to. Find it in the dashboard project settings.
Body
Model identifier. Open a model page for a dedicated playground with the correct body for that model.
llama-3.1-8b-instruct-fast, LFM2.5-1.2B-Instruct, LFM2.5-1.2B-Thinking, deberta-v3-small, gliner2-base-v1, gliner-multi-pii-v1, zlm-v1-followup-questions-edge, zlm-v1-iab-classify-edge, zlm-v1-iab-classify-edge-enriched "llama-3.1-8b-instruct-fast"
Ordered list of messages making up the conversation so far.
1Optional model-specific parameters, passed through to the model. For example, PII models accept mask and usecase. See the relevant model page for supported keys.
Response
Success
An OpenAI-compatible chat completion.
Unique identifier for the completion.
"chatcmpl_abc123"
Object type. Always chat.completion.
"chat.completion"
Unix timestamp (seconds) when the completion was created.
1710000000
The model used for inference.
"llama-3.1-8b-instruct-fast"
List of completion choices.
Token usage statistics for the request.

