Skip to main content
POST
/
chat
/
completions
Chat-completions style inference
curl --request POST \
  --url https://api.zerogpu.ai/v1/chat/completions \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --header 'x-project-id: <api-key>' \
  --data '
{
  "model": "llama-3.1-8b-instruct-fast",
  "messages": [
    {
      "role": "user",
      "content": "NASA announced that its Artemis III mission is now scheduled for late 2026, marking the first time astronauts will land on the lunar surface since Apollo 17 in 1972. The mission will send a crew of four to the Moon aboard the Orion spacecraft, with two astronauts descending to the south pole using SpaceX Starship as a lunar lander. Scientists are particularly excited about exploring permanently shadowed craters that may contain water ice, which could be critical for sustaining long-term human presence on the Moon."
    }
  ]
}
'
{
  "id": "chatcmpl_abc123",
  "object": "chat.completion",
  "created": 1710000000,
  "model": "llama-3.1-8b-instruct-fast",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Artemis III, slated for late 2026, will return astronauts to the Moon for the first time since 1972, landing two crew at the south pole to study shadowed craters that may hold water ice."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 112,
    "completion_tokens": 44,
    "total_tokens": 156
  }
}
Some models are invoked with a chat-completions style body (messages array) instead of the Responses input field. The dashboard and model catalog indicate which route applies. Each model page has its own playground with the right examples for that model. Install the official SDK from npm or PyPI (pip install zerogpu-api). Source: zerogpu/SDK. Response JSON shape depends on the model; handle errors the same way as API error codes.

Authorizations

x-api-key
string
header
required

Your ZeroGPU API key. Create one in the dashboard under API keys. Send it on every request.

x-project-id
string
header
required

The UUID of the project the request is billed to. Find it in the dashboard project settings.

Body

application/json
model
enum<string>
required

Model identifier. Open a model page for a dedicated playground with the correct body for that model.

Available options:
llama-3.1-8b-instruct-fast,
LFM2.5-1.2B-Instruct,
LFM2.5-1.2B-Thinking,
deberta-v3-small,
gliner2-base-v1,
gliner-multi-pii-v1,
zlm-v1-followup-questions-edge,
zlm-v1-iab-classify-edge,
zlm-v1-iab-classify-edge-enriched
Example:

"llama-3.1-8b-instruct-fast"

messages
object[]
required

Ordered list of messages making up the conversation so far.

Minimum array length: 1
metadata
object

Optional model-specific parameters, passed through to the model. For example, PII models accept mask and usecase. See the relevant model page for supported keys.

Response

Success

An OpenAI-compatible chat completion.

id
string

Unique identifier for the completion.

Example:

"chatcmpl_abc123"

object
string

Object type. Always chat.completion.

Example:

"chat.completion"

created
integer

Unix timestamp (seconds) when the completion was created.

Example:

1710000000

model
string

The model used for inference.

Example:

"llama-3.1-8b-instruct-fast"

choices
object[]

List of completion choices.

usage
object

Token usage statistics for the request.