Responses

`POST /v1/responses`

Send input to an AI model and receive a generated response.

`input` shape

Depending on the model, input may be:

a plain string (common for many production models), or
an array of message objects with role (user or system) and content (string).

Use the format your model expects; if you get 400 with invalid_type on input, switch between these shapes.

Request headers

Header	Type	Required	Description
`x-api-key`	string	Yes	Your ZeroGPU API key
`x-project-id`	string	Yes	Your project UUID
`content-type`	string	Yes	Must be `application/json`

Request body

Parameter	Type	Required	Description
`model`	string	Yes	The model identifier (available from your dashboard)
`input`	string or array	Yes	Plain text, or list of input message objects (model-dependent)
`text`	object	No	Response format configuration
`metadata`	object	No	Optional model-specific parameters (e.g. PII `mask`, `usecase`) when the model supports them

See also Chat completions for the messages route.

Field	Type	Description
`role`	string	The role of the message author: `user` or `system`
`content`	string	The content of the message

Plain string `input`

When input is a string, send the full user text or document as a single JSON string value.

Text format object

Field	Type	Description
`text.format.type`	string	Response format type (e.g., `text`)

Example request

curl --location 'https://api.zerogpu.ai/v1/responses' \
  --header 'content-type: application/json' \
  --header 'x-api-key: YOUR_API_KEY' \
  --header 'x-project-id: YOUR_PROJECT_ID' \
  --data '{
    "model": "YOUR_MODEL",
    "input": [
      {
        "role": "user",
        "content": "Your input text here..."
      }
    ],
    "text": {
      "format": {
        "type": "text"
      }
    }
  }'

Example: string `input`

Some models expect input as a single string instead of a message list:

{
  "model": "YOUR_MODEL",
  "input": "Your full prompt or document text here...",
  "text": {
    "format": { "type": "text" }
  }
}

Example response

{
  "id": "resp_abc123",
  "object": "response",
  "created": 1710000000,
  "model": "your-selected-model",
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "The generated response from the model..."
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 24,
    "output_tokens": 32,
    "total_tokens": 56
  }
}

Response fields

Field	Type	Description
`id`	string	Unique identifier for the response
`object`	string	Object type (`response`)
`created`	integer	Unix timestamp of when the response was created
`model`	string	The model used for inference
`output`	array	Array of output message objects
`output[].role`	string	Always `assistant`
`output[].content[].text`	string	The generated text response
`usage`	object	Token usage statistics
`usage.input_tokens`	integer	Number of tokens in the input
`usage.output_tokens`	integer	Number of tokens generated
`usage.total_tokens`	integer	Total tokens consumed

Context length

Each model has a maximum input token limit. If your input exceeds it:

The API may return 420 with error.code context_length_exceeded when the model is configured to reject over-length input.
Otherwise the input may be truncated to the limit and the response will include usage for the truncated input.

Keep requests within the model’s token limit or handle 420 and truncation in your client.

Overview

Endpoints

SDK Examples

`POST /v1/responses`

`input` shape

Request headers

Request body

Plain string `input`

Text format object

Example request

Example: string `input`

Example response

Response fields

Context length

Overview

Endpoints

SDK Examples

Documentation Index

​POST /v1/responses

​input shape

​Request headers

​Request body

​Plain string input

​Text format object

​Example request

​Example: string input

​Example response

​Response fields

​Context length

`POST /v1/responses`

`input` shape

Request headers

Request body

Plain string `input`

Text format object

Example request

Example: string `input`

Example response

Response fields

Context length