Documentation Index
Fetch the complete documentation index at: https://docs.zerogpu.ai/llms.txt
Use this file to discover all available pages before exploring further.
POST /v1/responses
Send input to an AI model and receive a generated response.
Depending on the model, input may be:
- a plain string (common for many production models), or
- an array of message objects with
role (user or system) and content (string).
Use the format your model expects; if you get 400 with invalid_type on input, switch between these shapes.
| Header | Type | Required | Description |
|---|
x-api-key | string | Yes | Your ZeroGPU API key |
x-project-id | string | Yes | Your project UUID |
content-type | string | Yes | Must be application/json |
Request body
| Parameter | Type | Required | Description |
|---|
model | string | Yes | The model identifier (available from your dashboard) |
input | string or array | Yes | Plain text, or list of input message objects (model-dependent) |
text | object | No | Response format configuration |
metadata | object | No | Optional model-specific parameters (e.g. PII mask, usecase) when the model supports them |
See also Chat completions for the messages route.
| Field | Type | Description |
|---|
role | string | The role of the message author: user or system |
content | string | The content of the message |
When input is a string, send the full user text or document as a single JSON string value.
Text format object
| Field | Type | Description |
|---|
text.format.type | string | Response format type (e.g., text) |
Example request
curl --location 'https://api.zerogpu.ai/v1/responses' \
--header 'content-type: application/json' \
--header 'x-api-key: YOUR_API_KEY' \
--header 'x-project-id: YOUR_PROJECT_ID' \
--data '{
"model": "YOUR_MODEL",
"input": [
{
"role": "user",
"content": "Your input text here..."
}
],
"text": {
"format": {
"type": "text"
}
}
}'
Some models expect input as a single string instead of a message list:
{
"model": "YOUR_MODEL",
"input": "Your full prompt or document text here...",
"text": {
"format": { "type": "text" }
}
}
Example response
{
"id": "resp_abc123",
"object": "response",
"created": 1710000000,
"model": "your-selected-model",
"output": [
{
"type": "message",
"role": "assistant",
"content": [
{
"type": "output_text",
"text": "The generated response from the model..."
}
]
}
],
"usage": {
"input_tokens": 24,
"output_tokens": 32,
"total_tokens": 56
}
}
Response fields
| Field | Type | Description |
|---|
id | string | Unique identifier for the response |
object | string | Object type (response) |
created | integer | Unix timestamp of when the response was created |
model | string | The model used for inference |
output | array | Array of output message objects |
output[].role | string | Always assistant |
output[].content[].text | string | The generated text response |
usage | object | Token usage statistics |
usage.input_tokens | integer | Number of tokens in the input |
usage.output_tokens | integer | Number of tokens generated |
usage.total_tokens | integer | Total tokens consumed |
Context length
Each model has a maximum input token limit. If your input exceeds it:
- The API may return
420 with error.code context_length_exceeded when the model is configured to reject over-length input.
- Otherwise the input may be truncated to the limit and the response will include usage for the truncated input.
Keep requests within the model’s token limit or handle 420 and truncation in your client.