Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.zerogpu.ai/llms.txt

Use this file to discover all available pages before exploring further.

This tutorial walks through three production-style use cases for the Data extraction recipe. You will use plain text only (resumes, LinkedIn-style exports, job posts). Not OCR or PDF parsing.
All sample people and companies in the example dataset are synthetic and fictional.

What you will build

StepInputOutput
1Plain-text resumecandidate JSON (name, title, skills, …)
2Scraped profile textprofile JSON (headline, role, location, …)
3Job descriptionLabeled skill/tool spans (ner)
Model (steps 1-3): gliner2-base-v1
Endpoint: POST https://api.zerogpu.ai/v1/responses
Get your API key and project ID from the dashboard. See Authentication.

Before you start

Clone or browse the example data and scripts in the docs repo:
tutorials/data-extraction/
├── dataset/          # resumes.jsonl, profiles.jsonl, job_posts.jsonl
├── schemas/          # resume.json, linkedin-profile.json
├── scripts/          # run_batch_extraction.py
└── blog-post.md      # companion article (optional read)
GitHub: zerogpu/docs/tutorials/data-extraction (see the README for setup).

Step 1: Extract a candidate from a resume

Open dataset/resumes.jsonl and pick resume-001 (synthetic). Define a schema under metadata.schema with use case json.
curl --location 'https://api.zerogpu.ai/v1/responses' \
  --header 'content-type: application/json' \
  --header 'x-api-key: YOUR_API_KEY' \
  --header 'x-project-id: YOUR_PROJECT_ID' \
  --data '{
    "model": "gliner2-base-v1",
    "input": "ALEX RIVERA\nSan Francisco, CA | [email protected] | (415) 555-0198\n\nSUMMARY\nBackend engineer with 7 years building APIs and data pipelines...",
    "metadata": {
      "usecase": "json",
      "schema": {
        "candidate": [
          "full_name::str::Candidate full name",
          "email::str::Email address",
          "phone::str::Phone number",
          "location::str::City, state, or country",
          "current_title::str::Most recent job title",
          "current_company::str::Most recent employer",
          "skills::str::Comma-separated key skills mentioned"
        ]
      }
    }
  }'
Check your result: You should see a candidate object (often under data.candidate[0]) with populated strings. If a field is missing from the source text, it may be empty. Tighten field descriptions or add a second pass only where needed.

Step 2: Structure LinkedIn-style profile text

Scrapers often return a single blob: headline, About, Experience. Use a profile schema (same json use case).
"schema": {
  "profile": [
    "name::str::Person name",
    "headline::str::Professional headline",
    "location::str::Profile location",
    "current_role::str::Current job title",
    "current_company::str::Current company"
  ]
}
Paste text from profiles.jsonl (profile-001) as input. Parse the response the same way as Step 1.
This tutorial assumes you already have compliant text from your pipeline. ZeroGPU does not scrape LinkedIn for you.

Step 3: Tag skills in a job post (NER)

When you need categories instead of fixed columns, set metadata.usecase to ner and pass labels:
curl --location 'https://api.zerogpu.ai/v1/responses' \
  --header 'content-type: application/json' \
  --header 'x-api-key: YOUR_API_KEY' \
  --header 'x-project-id: YOUR_PROJECT_ID' \
  --data '{
    "model": "gliner2-base-v1",
    "input": "We are hiring a Backend Engineer to design REST APIs in Python, deploy on AWS EKS, and maintain PostgreSQL databases.",
    "metadata": {
      "usecase": "ner",
      "labels": ["programming language", "database", "cloud platform", "framework"],
      "threshold": 0.3
    }
  }'
Lower threshold to recall more spans; raise it when precision matters more.

Step 4: Run the full example dataset

Batch all synthetic resumes or profiles locally:
cd tutorials/data-extraction/scripts
python3 -m venv .venv
.venv/bin/pip install -r requirements.txt
export ZEROGPU_API_KEY="zgpu-..."
export ZEROGPU_PROJECT_ID="your-project-uuid"
.venv/bin/python run_batch_extraction.py --dataset resumes --limit 3
.venv/bin/python run_batch_extraction.py --dataset profiles --limit 3
.venv/bin/python run_batch_extraction.py --dataset job_posts
Results are written to tutorials/data-extraction/outputs/*.jsonl. Inspect them before wiring into production ETL. For many files at once, combine this with Batch requests patterns (parallelism, retries).

Optional: PII on inbound text

Before storing user-submitted resumes, run PII extraction with gliner-multi-pii-v1 and extract-pii or redact.

Production tips

  • Start small: 5-8 schema fields beat 20 vague ones.
  • Golden set: Keep 10-20 labeled examples from your real text shapes; re-run after schema changes.
  • Text only: PDF and image pipelines are out of scope for this tutorial; convert to text upstream.
  • Monitor: Use Logs and Usage in the dashboard.

Go deeper

Data extraction recipe

API reference-style examples for json, NER, and PII.

Example dataset (GitHub)

Synthetic resumes, profiles, and job posts (JSONL).

Companion article

Long-form walkthrough in the docs repo.

gliner2-base-v1 playground

Try schemas interactively in the model catalog.