This tutorial walks through three production-style use cases for the Data extraction recipe. You will use plain text only (resumes, LinkedIn-style exports, job posts). Not OCR or PDF parsing.Documentation Index
Fetch the complete documentation index at: https://docs.zerogpu.ai/llms.txt
Use this file to discover all available pages before exploring further.
All sample people and companies in the example dataset are synthetic and fictional.
What you will build
| Step | Input | Output |
|---|---|---|
| 1 | Plain-text resume | candidate JSON (name, title, skills, …) |
| 2 | Scraped profile text | profile JSON (headline, role, location, …) |
| 3 | Job description | Labeled skill/tool spans (ner) |
gliner2-base-v1Endpoint:
POST https://api.zerogpu.ai/v1/responses
Get your API key and project ID from the dashboard. See Authentication.
Before you start
Clone or browse the example data and scripts in the docs repo:Step 1: Extract a candidate from a resume
Opendataset/resumes.jsonl and pick resume-001 (synthetic). Define a schema under metadata.schema with use case json.
candidate object (often under data.candidate[0]) with populated strings. If a field is missing from the source text, it may be empty. Tighten field descriptions or add a second pass only where needed.
Step 2: Structure LinkedIn-style profile text
Scrapers often return a single blob: headline, About, Experience. Use aprofile schema (same json use case).
profiles.jsonl (profile-001) as input. Parse the response the same way as Step 1.
This tutorial assumes you already have compliant text from your pipeline. ZeroGPU does not scrape LinkedIn for you.
Step 3: Tag skills in a job post (NER)
When you need categories instead of fixed columns, setmetadata.usecase to ner and pass labels:
threshold to recall more spans; raise it when precision matters more.
Step 4: Run the full example dataset
Batch all synthetic resumes or profiles locally:tutorials/data-extraction/outputs/*.jsonl. Inspect them before wiring into production ETL.
For many files at once, combine this with Batch requests patterns (parallelism, retries).
Optional: PII on inbound text
Before storing user-submitted resumes, run PII extraction withgliner-multi-pii-v1 and extract-pii or redact.
Production tips
- Start small: 5-8 schema fields beat 20 vague ones.
- Golden set: Keep 10-20 labeled examples from your real text shapes; re-run after schema changes.
- Text only: PDF and image pipelines are out of scope for this tutorial; convert to text upstream.
- Monitor: Use Logs and Usage in the dashboard.
Go deeper
Data extraction recipe
API reference-style examples for json, NER, and PII.
Example dataset (GitHub)
Synthetic resumes, profiles, and job posts (JSONL).
Companion article
Long-form walkthrough in the docs repo.
gliner2-base-v1 playground
Try schemas interactively in the model catalog.

