Tutorial: Resume & profile extraction

This tutorial walks through three production-style use cases for the Data extraction recipe. You will use plain text only (resumes, LinkedIn-style exports, job posts). Not OCR or PDF parsing.

All sample people and companies in the example dataset are synthetic and fictional.

What you will build

Step	Input	Output
1	Plain-text resume	`candidate` JSON (name, title, skills, …)
2	Scraped profile text	`profile` JSON (headline, role, location, …)
3	Job description	Labeled skill/tool spans (`ner`)

Model (steps 1-3): gliner2-base-v1
Endpoint: POST https://api.zerogpu.ai/v1/responses Get your API key and project ID from the dashboard. See Authentication.

Before you start

Clone or browse the example data and scripts in the docs repo:

tutorials/data-extraction/
├── dataset/          # resumes.jsonl, profiles.jsonl, job_posts.jsonl
├── schemas/          # resume.json, linkedin-profile.json
├── scripts/          # run_batch_extraction.py
└── blog-post.md      # companion article (optional read)

GitHub: zerogpu/docs/tutorials/data-extraction (see the README for setup).

Step 1: Extract a candidate from a resume

Open dataset/resumes.jsonl and pick resume-001 (synthetic). Define a schema under metadata.schema with use case json.

curl --location 'https://api.zerogpu.ai/v1/responses' \
  --header 'content-type: application/json' \
  --header 'x-api-key: YOUR_API_KEY' \
  --header 'x-project-id: YOUR_PROJECT_ID' \
  --data '{
    "model": "gliner2-base-v1",
    "input": "ALEX RIVERA\nSan Francisco, CA | [email protected] | (415) 555-0198\n\nSUMMARY\nBackend engineer with 7 years building APIs and data pipelines...",
    "metadata": {
      "usecase": "json",
      "schema": {
        "candidate": [
          "full_name::str::Candidate full name",
          "email::str::Email address",
          "phone::str::Phone number",
          "location::str::City, state, or country",
          "current_title::str::Most recent job title",
          "current_company::str::Most recent employer",
          "skills::str::Comma-separated key skills mentioned"
        ]
      }
    }
  }'

Check your result: You should see a candidate object (often under data.candidate[0]) with populated strings. If a field is missing from the source text, it may be empty. Tighten field descriptions or add a second pass only where needed.

Step 2: Structure LinkedIn-style profile text

Scrapers often return a single blob: headline, About, Experience. Use a profile schema (same json use case).

"schema": {
  "profile": [
    "name::str::Person name",
    "headline::str::Professional headline",
    "location::str::Profile location",
    "current_role::str::Current job title",
    "current_company::str::Current company"
  ]
}

Paste text from profiles.jsonl (profile-001) as input. Parse the response the same way as Step 1.

This tutorial assumes you already have compliant text from your pipeline. ZeroGPU does not scrape LinkedIn for you.

Step 3: Tag skills in a job post (NER)

When you need categories instead of fixed columns, set metadata.usecase to ner and pass labels:

curl --location 'https://api.zerogpu.ai/v1/responses' \
  --header 'content-type: application/json' \
  --header 'x-api-key: YOUR_API_KEY' \
  --header 'x-project-id: YOUR_PROJECT_ID' \
  --data '{
    "model": "gliner2-base-v1",
    "input": "We are hiring a Backend Engineer to design REST APIs in Python, deploy on AWS EKS, and maintain PostgreSQL databases.",
    "metadata": {
      "usecase": "ner",
      "labels": ["programming language", "database", "cloud platform", "framework"],
      "threshold": 0.3
    }
  }'

Lower threshold to recall more spans; raise it when precision matters more.

Step 4: Run the full example dataset

Batch all synthetic resumes or profiles locally:

cd tutorials/data-extraction/scripts
python3 -m venv .venv
.venv/bin/pip install -r requirements.txt
export ZEROGPU_API_KEY="zgpu-..."
export ZEROGPU_PROJECT_ID="your-project-uuid"
.venv/bin/python run_batch_extraction.py --dataset resumes --limit 3
.venv/bin/python run_batch_extraction.py --dataset profiles --limit 3
.venv/bin/python run_batch_extraction.py --dataset job_posts

Results are written to tutorials/data-extraction/outputs/*.jsonl. Inspect them before wiring into production ETL. For many files at once, combine this with Batch requests patterns (parallelism, retries).

Optional: PII on inbound text

Before storing user-submitted resumes, run PII extraction with gliner-multi-pii-v1 and extract-pii or redact.

Production tips

Start small: 5-8 schema fields beat 20 vague ones.
Golden set: Keep 10-20 labeled examples from your real text shapes; re-run after schema changes.
Text only: PDF and image pipelines are out of scope for this tutorial; convert to text upstream.
Monitor: Use Logs and Usage in the dashboard.

Go deeper

Data extraction recipe

API reference-style examples for json, NER, and PII.

Example dataset (GitHub)

Synthetic resumes, profiles, and job posts (JSONL).

Companion article

Long-form walkthrough in the docs repo.

gliner2-base-v1 playground

Try schemas interactively in the model catalog.

Cookbook

Documentation Index

​What you will build

​Before you start

​Step 1: Extract a candidate from a resume

​Step 2: Structure LinkedIn-style profile text

​Step 3: Tag skills in a job post (NER)

​Step 4: Run the full example dataset

​Optional: PII on inbound text

​Production tips

​Go deeper

Data extraction recipe

Example dataset (GitHub)

Companion article

gliner2-base-v1 playground

What you will build

Before you start

Step 1: Extract a candidate from a resume

Step 2: Structure LinkedIn-style profile text

Step 3: Tag skills in a job post (NER)

Step 4: Run the full example dataset

Optional: PII on inbound text

Production tips

Go deeper