Responses - ZeroGPU

curl --request POST \ --url https://api.zerogpu.ai/v1/responses \ --header 'Content-Type: application/json' \ --header 'x-api-key: <api-key>' \ --header 'x-project-id: <api-key>' \ --data ' { "model": "llama-3.1-8b-instruct-fast", "input": "NASA announced that its Artemis III mission is now scheduled for late 2026, marking the first time astronauts will land on the lunar surface since Apollo 17 in 1972. The mission will send a crew of four to the Moon aboard the Orion spacecraft, with two astronauts descending to the south pole using SpaceX Starship as a lunar lander. Scientists are particularly excited about exploring permanently shadowed craters that may contain water ice, which could be critical for sustaining long-term human presence on the Moon." } '

{ "id": "resp_abc123", "object": "response", "created": 1710000000, "model": "llama-3.1-8b-instruct-fast", "output": [ { "type": "message", "role": "assistant", "content": [ { "type": "output_text", "text": "Artemis III, slated for late 2026, will return astronauts to the Moon for the first time since 1972, landing two crew at the south pole to study shadowed craters that may hold water ice." } ] } ], "usage": { "input_tokens": 112, "output_tokens": 44, "total_tokens": 156 } }

Authorizations

x-api-key

string

header

required

Your ZeroGPU API key. Create one in the dashboard under API keys. Send it on every request.

x-project-id

string

header

required

The UUID of the project the request is billed to. Find it in the dashboard project settings.

Body

application/json

model

enum<string>

required

Model identifier. Open a model page for a dedicated playground with the correct body for that model.

Available options:

llama-3.1-8b-instruct-fast,

LFM2.5-1.2B-Instruct,

LFM2.5-1.2B-Thinking,

deberta-v3-small,

gliner2-base-v1,

gliner-multi-pii-v1,

zlm-v1-followup-questions-edge,

zlm-v1-iab-classify-edge,

zlm-v1-iab-classify-edge-enriched

Example:

"llama-3.1-8b-instruct-fast"

input

string<textarea>

required

The text or document to send, as a plain string.

Required string length: 1 - 131072

text

object

Response format configuration.

Hide child attributes

text.format

object

Hide child attributes

text.format.type

string

Output format for the response, for example text.

Example:

"text"

instructions

string

Optional system-style instructions applied on top of input, for models that support them.

metadata

object

Optional model-specific parameters, passed through to the model. For example, PII models accept mask and usecase. See the relevant model page for supported keys.

Response

Success

The generated model response.

string

Unique identifier for the response.

Example:

"resp_abc123"

object

string

Object type. Always response.

Example:

"response"

created

integer

Unix timestamp (seconds) when the response was created.

Example:

1710000000

model

string

The model used for inference.

Example:

"llama-3.1-8b-instruct-fast"

output

object[]

Output message objects produced by the model.

Hide child attributes

output.type

string

Output item type.

Example:

"message"

output.role

string

Author of the output. Always assistant.

Example:

"assistant"

output.content

object[]

Content parts that make up the message.

Hide child attributes

output.content.type

string

Content part type.

Example:

"output_text"

output.content.text

string

The generated text.

Example:

"Artemis III, slated for late 2026, will return astronauts to the Moon for the first time since 1972."

usage

object

Token usage statistics for the request.

Hide child attributes

usage.input_tokens

integer

Number of tokens in the input.

Example:

112

usage.output_tokens

integer

Number of tokens generated.

Example:

44

usage.total_tokens

integer

Total tokens consumed (input plus output).

Example:

156