reviews.csv with one review per row, and you get back a tagged.csv with a sentiment label and a short list of topics for every row, plus a recoverable list of any rows that failed. By combining the Batch APIβs OpenAI-compatible JSONL flow with LFM2.5-1.2B-Instruct, this notebook walks you through a practical pattern where thousands of rows are tagged overnight at a fraction of the cost of synchronous calls, keyed back to their source rows by custom_id.
For the full reference, see the Batch API quickstart.
For another end-to-end workflow, see:
Screen resumes with LangChain and ZeroGPU.
In this notebook, youβll explore:
- ZeroGPU Batch API: An asynchronous, OpenAI-compatible endpoint that takes thousands of
/v1/chat/completionsrequests as a single JSONL file and returns the results within a completion window, at a lower per-request cost than synchronous calls. Here it tags every row of a customer-review CSV in one overnight job. - ZeroGPU: An ultra-fast, compute-efficient inference provider for apps and agents. We run purpose-built small and nano language models across an edge-powered network for the high-volume, purpose-specific tasks your app or agent runs constantly. Plug in our OpenAI-compatible API and youβre live - zero GPU infrastructure, serverless, auto-scaling by default.
- LFM2.5-1.2B-Instruct: A small, fast instruct model that handles short-form classification well, keeping per-row cost low while still producing open-vocabulary topics and a JSON tag. See the model card for context window and playground.
Run this example
Run it in Google Colab and execute cells top to bottom β no setup required. π·οΈ Run in Google Colab β The notebook generates the dataset, builds the JSONL file, and runs the full Batch API workflow automatically.π₯ Watch the Video Guide
Prefer a quick walkthrough? Watch the full demo here:π¦ Installation
First, installrequests, the only dependency for driving the Batch API from Python. You reach ZeroGPU through its OpenAI-compatible REST surface, so no SDK is required:
jq to pretty-print JSON responses. For the full lifecycle, see the Batch API quickstart.
π Setting Up API Keys
Youβll need to set up your ZeroGPU API key and Project ID so that every Batch API call authenticates. This ensures the upload, create, poll, and download calls can reach ZeroGPU securely. You can go to here to get an API key and Project ID from ZeroGPU. The key starts withzgpu-api- and the Project ID (UUID) is on the project settings page.
Python
x-api-key and x-project-id headers. If you use an SDK client, authentication is handled for you. The cURL examples read the same two values from your shell environment:
Python
π Tag a CSV of Reviews Overnight
This section takes a CSV of customer reviews and produces a tagged copy plus a recoverable list of failures, with the Batch API running every row as one asynchronous job while you sleep. Your support tool exportsreviews.csv, one review per row. You want a sentiment label and a short list of topics for every row, but calling the model synchronously across thousands of rows is slow and costly. Instead you submit them as a single batch, poll until it finishes, and merge the results back by custom_id.
The job has five steps: prepare the data, build a JSONL request file, upload it, create the batch, then poll and download the results.
Step 1: Prepare the input CSV
The dataset is generated directly in the Colab notebook, so you do not need to prepare any files manually. Each row includes a uniquereview_id and a free-text review. One row is intentionally malformed (empty review) to demonstrate how invalid input is handled.
Keep the review_id column unique. It is what links a tagged row back to its source, and duplicates are rejected at create time.
Step 2: Build the JSONL, one request per row
The notebook builds this JSONL file programmatically from the CSV, so you do not need to create it manually. We pin the same JSON-only system prompt on every line:r-001 looks like this:
r-007 has no review text. The builder skips it locally rather than spending a request on a line the API would reject, and records it as a local skip.
Step 3: Upload the file and create the batch
Upload the JSONL withpurpose=batch, then create the batch against /v1/chat/completions:
POST /v1/batches returns, so a duplicate custom_id, a line over 1 MB, or "stream": true rejects the whole batch with a 400 that points at the offending line. Fix the JSONL locally and resubmit; nothing is charged for a rejected create.
Step 4: Poll until the batch finishes
PollGET /v1/batches/{id} until the batch reaches a terminal state. A batch ends in one of four: completed, failed, expired, or cancelled. Only completed guarantees an output_file_id, and even a completed batch can carry a populated error_file_id when some lines failed. Donβt poll faster than every 30 seconds; the status only changes on minute-scale transitions.
Python
Step 5: Download and merge into a tagged CSV
The full script below reads the CSV, builds the JSONL (skipping the empty row), uploads, creates, polls, downloads the output and error files, and merges everything back intotagged.csv by custom_id. Failed rows land in failed.csv for inspection.
tag_reviews.py
custom_id, then walks the original CSV in order to attach tags. This means the row order of tagged.csv is identical to reviews.csv, even though the Batch API returned results in arbitrary order, and rows that failed land with empty sentiment and topics columns instead of being dropped silently.
reviews.csv goes in:
tagged.csv comes out:
r-007 carries through with empty tags because it was filtered locally before upload, so it never reached the API.
Even a completed batch can have a populated error_file_id when some lines failed but the batch as a whole ran. A failed line carries a null response and a populated error. If you skip the local filter and let the API reject the empty row, the error file carries a line like this:
input.jsonl, keyed by the custom_ids that appear in the error file, then upload and create a new batch the same way as before:
recover.py
invalid_request_error (repair the line locally and retry), rate_limit_exceeded (wait, raise quota, then retry the failed custom_ids), model_error (transient, safe to retry as-is), and timeout (shorten the input or lower max_tokens, then retry). For a malformed row like r-007, the right fix is upstream: drop or backfill it before building the JSONL, which is exactly what build_jsonl already does. See the Errors reference for the full table.
π From one JSONL upload, the Batch API tagged every review row, returned results keyed by custom_id, and surfaced failures as a recoverable list, all in a single overnight job that costs a fraction of synchronous calls.
π Highlights
This notebook has guided you through setting up and running a ZeroGPU Batch API workflow for tagging a CSV of customer reviews with sentiment and topics. You can adapt and expand this example for various other scenarios requiring high-volume, asynchronous classification over tabular data. Key tools utilized in this notebook include:- ZeroGPU Batch API: An asynchronous, OpenAI-compatible endpoint that takes thousands of
/v1/chat/completionsrequests as a single JSONL file and returns the results within a completion window, at a lower per-request cost than synchronous calls. Here it tags every row of a customer-review CSV in one overnight job. - ZeroGPU: An ultra-fast, compute-efficient inference provider for apps and agents. We run purpose-built small and nano language models across an edge-powered network for the high-volume, purpose-specific tasks your app or agent runs constantly. Plug in our OpenAI-compatible API and youβre live - zero GPU infrastructure, serverless, auto-scaling by default.
- LFM2.5-1.2B-Instruct: A small, fast instruct model that handles short-form classification well, keeping per-row cost low while still producing open-vocabulary topics and a JSON tag. See the model card for context window and playground.

