Responses
Send input to an AI model and receive a response.
input; what comes back depends on the model: generated text, a classification, extracted fields, redacted output, and more. The input and output shapes vary by model, so open a model page for a prefilled playground. For the messages-style route, see Chat completions.
Install the official SDK from npm or PyPI (pip install zerogpu-api). Source: zerogpu/SDK. Handle errors the same way as API error codes.Authorizations
Your ZeroGPU API key. Create one in the dashboard under API keys. Send it on every request.
The UUID of the project the request is billed to. Find it in the dashboard project settings.
Body
Model identifier. Open a model page for a dedicated playground with the correct body for that model.
llama-3.1-8b-instruct-fast, LFM2.5-1.2B-Instruct, LFM2.5-1.2B-Thinking, deberta-v3-small, gliner2-base-v1, gliner-multi-pii-v1, zlm-v1-followup-questions-edge, zlm-v1-iab-classify-edge, zlm-v1-iab-classify-edge-enriched "llama-3.1-8b-instruct-fast"
The text or document to send, as a plain string.
1 - 131072Response format configuration.
Optional system-style instructions applied on top of input, for models that support them.
Optional model-specific parameters, passed through to the model. For example, PII models accept mask and usecase. See the relevant model page for supported keys.
Response
Success
The generated model response.
Unique identifier for the response.
"resp_abc123"
Object type. Always response.
"response"
Unix timestamp (seconds) when the response was created.
1710000000
The model used for inference.
"llama-3.1-8b-instruct-fast"
Output message objects produced by the model.
Token usage statistics for the request.

