POST /v1/responses
Send a list of input messages to an AI model and receive a generated response.
Request headers
| Header | Type | Required | Description |
|---|---|---|---|
x-api-key | string | Yes | Your ZeroGPU API key |
x-project-id | string | Yes | Your project UUID |
content-type | string | Yes | Must be application/json |
Request body
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | The model identifier (available from your dashboard) |
input | array | Yes | Array of input message objects |
text | object | No | Response format configuration |
Input message object
| Field | Type | Description |
|---|---|---|
role | string | The role of the message author: user or system |
content | string | The content of the message |
Text format object
| Field | Type | Description |
|---|---|---|
text.format.type | string | Response format type (e.g., text) |
Example request
Example response
Response fields
| Field | Type | Description |
|---|---|---|
id | string | Unique identifier for the response |
object | string | Object type (response) |
created | integer | Unix timestamp of when the response was created |
model | string | The model used for inference |
output | array | Array of output message objects |
output[].role | string | Always assistant |
output[].content[].text | string | The generated text response |
usage | object | Token usage statistics |
usage.input_tokens | integer | Number of tokens in the input |
usage.output_tokens | integer | Number of tokens generated |
usage.total_tokens | integer | Total tokens consumed |
Context length
Each model has a maximum input token limit. If your input exceeds it:- The API may return
420witherror.codecontext_length_exceededwhen the model is configured to reject over-length input. - Otherwise the input may be truncated to the limit and the response will include usage for the truncated input.
420 and truncation in your client.
