Skip to main content
POST
/
responses
Create response
curl --request POST \
  --url https://api.zerogpu.ai/v1/responses \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --header 'x-project-id: <api-key>' \
  --data '
{
  "model": "llama-3.1-8b-instruct-fast",
  "input": "NASA announced that its Artemis III mission is now scheduled for late 2026, marking the first time astronauts will land on the lunar surface since Apollo 17 in 1972. The mission will send a crew of four to the Moon aboard the Orion spacecraft, with two astronauts descending to the south pole using SpaceX Starship as a lunar lander. Scientists are particularly excited about exploring permanently shadowed craters that may contain water ice, which could be critical for sustaining long-term human presence on the Moon."
}
'
{
  "id": "resp_abc123",
  "object": "response",
  "created": 1710000000,
  "model": "llama-3.1-8b-instruct-fast",
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "Artemis III, slated for late 2026, will return astronauts to the Moon for the first time since 1972, landing two crew at the south pole to study shadowed craters that may hold water ice."
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 112,
    "output_tokens": 44,
    "total_tokens": 156
  }
}
One endpoint for every model. Pass text or a document as input; what comes back depends on the model: generated text, a classification, extracted fields, redacted output, and more. The input and output shapes vary by model, so open a model page for a prefilled playground. For the messages-style route, see Chat completions. Install the official SDK from npm or PyPI (pip install zerogpu-api). Source: zerogpu/SDK. Handle errors the same way as API error codes.

Authorizations

x-api-key
string
header
required

Your ZeroGPU API key. Create one in the dashboard under API keys. Send it on every request.

x-project-id
string
header
required

The UUID of the project the request is billed to. Find it in the dashboard project settings.

Body

application/json
model
enum<string>
required

Model identifier. Open a model page for a dedicated playground with the correct body for that model.

Available options:
llama-3.1-8b-instruct-fast,
LFM2.5-1.2B-Instruct,
LFM2.5-1.2B-Thinking,
deberta-v3-small,
gliner2-base-v1,
gliner-multi-pii-v1,
zlm-v1-followup-questions-edge,
zlm-v1-iab-classify-edge,
zlm-v1-iab-classify-edge-enriched
Example:

"llama-3.1-8b-instruct-fast"

input
string<textarea>
required

The text or document to send, as a plain string.

Required string length: 1 - 131072
text
object

Response format configuration.

instructions
string

Optional system-style instructions applied on top of input, for models that support them.

metadata
object

Optional model-specific parameters, passed through to the model. For example, PII models accept mask and usecase. See the relevant model page for supported keys.

Response

Success

The generated model response.

id
string

Unique identifier for the response.

Example:

"resp_abc123"

object
string

Object type. Always response.

Example:

"response"

created
integer

Unix timestamp (seconds) when the response was created.

Example:

1710000000

model
string

The model used for inference.

Example:

"llama-3.1-8b-instruct-fast"

output
object[]

Output message objects produced by the model.

usage
object

Token usage statistics for the request.