Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.zerogpu.ai/llms.txt

Use this file to discover all available pages before exploring further.

POST /v1/responses

Send input to an AI model and receive a generated response.

input shape

Depending on the model, input may be:
  • a plain string (common for many production models), or
  • an array of message objects with role (user or system) and content (string).
Use the format your model expects; if you get 400 with invalid_type on input, switch between these shapes.

Request headers

HeaderTypeRequiredDescription
x-api-keystringYesYour ZeroGPU API key
x-project-idstringYesYour project UUID
content-typestringYesMust be application/json

Request body

ParameterTypeRequiredDescription
modelstringYesThe model identifier (available from your dashboard)
inputstring or arrayYesPlain text, or list of input message objects (model-dependent)
textobjectNoResponse format configuration
metadataobjectNoOptional model-specific parameters (e.g. PII mask, usecase) when the model supports them
See also Chat completions for the messages route.
FieldTypeDescription
rolestringThe role of the message author: user or system
contentstringThe content of the message

Plain string input

When input is a string, send the full user text or document as a single JSON string value.

Text format object

FieldTypeDescription
text.format.typestringResponse format type (e.g., text)

Example request

curl --location 'https://api.zerogpu.ai/v1/responses' \
  --header 'content-type: application/json' \
  --header 'x-api-key: YOUR_API_KEY' \
  --header 'x-project-id: YOUR_PROJECT_ID' \
  --data '{
    "model": "YOUR_MODEL",
    "input": [
      {
        "role": "user",
        "content": "Your input text here..."
      }
    ],
    "text": {
      "format": {
        "type": "text"
      }
    }
  }'

Example: string input

Some models expect input as a single string instead of a message list:
{
  "model": "YOUR_MODEL",
  "input": "Your full prompt or document text here...",
  "text": {
    "format": { "type": "text" }
  }
}

Example response

{
  "id": "resp_abc123",
  "object": "response",
  "created": 1710000000,
  "model": "your-selected-model",
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "The generated response from the model..."
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 24,
    "output_tokens": 32,
    "total_tokens": 56
  }
}

Response fields

FieldTypeDescription
idstringUnique identifier for the response
objectstringObject type (response)
createdintegerUnix timestamp of when the response was created
modelstringThe model used for inference
outputarrayArray of output message objects
output[].rolestringAlways assistant
output[].content[].textstringThe generated text response
usageobjectToken usage statistics
usage.input_tokensintegerNumber of tokens in the input
usage.output_tokensintegerNumber of tokens generated
usage.total_tokensintegerTotal tokens consumed

Context length

Each model has a maximum input token limit. If your input exceeds it:
  • The API may return 420 with error.code context_length_exceeded when the model is configured to reject over-length input.
  • Otherwise the input may be truncated to the limit and the response will include usage for the truncated input.
Keep requests within the model’s token limit or handle 420 and truncation in your client.