> ## Documentation Index
> Fetch the complete documentation index at: https://docs.zerogpu.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# 🎓 Resume & profile extraction

> Step-by-step guide with example dataset. Text-only extraction, no OCR.

This tutorial walks through **three production-style use cases** for structured data extraction with the [gliner2-base-v1](/api-reference/models/gliner2-base-v1) model. You will use plain text only (resumes, LinkedIn-style exports, job posts). **Not** OCR or PDF parsing.

<Note>
  All sample people and companies in the [example
  dataset](https://github.com/zerogpu/cookbook/tree/main/demos/data-extraction/dataset)
  are **synthetic and fictional**.
</Note>

## 🎥 Watch the Video Guide

<iframe width="560" height="315" src="https://www.youtube.com/embed/QBZDKzjuA60?si=Mwkylkx8ioK97jUl" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen />

## What you will build

| Step | Input                | Output                                       |
| ---- | -------------------- | -------------------------------------------- |
| 1    | Plain-text resume    | `candidate` JSON (name, title, skills, …)    |
| 2    | Scraped profile text | `profile` JSON (headline, role, location, …) |
| 3    | Job description      | Labeled skill/tool spans (`ner`)             |

**Model (steps 1-3):** `gliner2-base-v1`\
**Endpoint:** `POST https://api.zerogpu.ai/v1/responses`

Get your **API key** and **project ID** from the [dashboard](https://zerogpu.ai). See [Authentication](/docs/platform#authentication).

## Before you start

Clone or browse the example data and scripts in the cookbook repo:

```
demos/data-extraction/
├── dataset/          # resumes.jsonl, profiles.jsonl, job_posts.jsonl
├── schemas/          # resume.json, linkedin-profile.json
├── scripts/          # run_batch_extraction.py
└── blog-post.md      # companion article (optional read)
```

GitHub: [zerogpu/cookbook/demos/data-extraction](https://github.com/zerogpu/cookbook/tree/main/demos/data-extraction) (see the [README](https://github.com/zerogpu/cookbook/blob/main/demos/data-extraction/README.md) for setup).

## Step 1: Extract a candidate from a resume

Open `dataset/resumes.jsonl` and pick `resume-001` (synthetic). Define a schema under `metadata.schema` with use case **`json`**.

<CodeGroup>
  ```bash cURL theme={null}
  curl --location 'https://api.zerogpu.ai/v1/responses' \
    --header 'content-type: application/json' \
    --header 'x-api-key: YOUR_API_KEY' \
    --header 'x-project-id: YOUR_PROJECT_ID' \
    --data '{
      "model": "gliner2-base-v1",
      "input": "ALEX RIVERA\nSan Francisco, CA | alex.rivera@example-mail.io | (415) 555-0198\n\nSUMMARY\nBackend engineer with 7 years building APIs and data pipelines...",
      "metadata": {
        "usecase": "json",
        "schema": {
          "candidate": [
            "full_name::str::Candidate full name",
            "email::str::Email address",
            "phone::str::Phone number",
            "location::str::City, state, or country",
            "current_title::str::Most recent job title",
            "current_company::str::Most recent employer",
            "skills::str::Comma-separated key skills mentioned"
          ]
        }
      }
    }'
  ```

  ```python Python theme={null}
  import json
  import requests

  url = "https://api.zerogpu.ai/v1/responses"
  headers = {
      "content-type": "application/json",
      "x-api-key": "YOUR_API_KEY",
      "x-project-id": "YOUR_PROJECT_ID",
  }

  # First line of demos/data-extraction/dataset/resumes.jsonl (clone cookbook repo)
  with open("resumes.jsonl", encoding="utf-8") as f:
      record = json.loads(f.readline())

  payload = {
      "model": "gliner2-base-v1",
      "input": record["text"],
      "metadata": {
          "usecase": "json",
          "schema": {
              "candidate": [
                  "full_name::str::Candidate full name",
                  "email::str::Email address",
                  "current_title::str::Most recent job title",
                  "current_company::str::Most recent employer",
                  "skills::str::Comma-separated key skills mentioned",
              ]
          },
      },
  }
  body = requests.post(url, headers=headers, json=payload).json()
  parsed = json.loads(body["output"][0]["content"][0]["text"])
  # GLiNER json responses are usually nested: data.candidate[0]
  candidate = parsed["data"]["candidate"][0]
  print(json.dumps(candidate, indent=2))
  ```
</CodeGroup>

**Check your result:** You should see a `candidate` object (often under `data.candidate[0]`) with populated strings. If a field is missing from the source text, it may be empty. Tighten field descriptions or add a second pass only where needed.

## Step 2: Structure LinkedIn-style profile text

Scrapers often return a single blob: headline, About, Experience. Use a **`profile`** schema (same `json` use case).

```json theme={null}
"schema": {
  "profile": [
    "name::str::Person name",
    "headline::str::Professional headline",
    "location::str::Profile location",
    "current_role::str::Current job title",
    "current_company::str::Current company"
  ]
}
```

Paste text from `profiles.jsonl` (`profile-001`) as `input`. Parse the response the same way as Step 1.

<Info>
  This tutorial assumes you already have **compliant text** from your pipeline.
  ZeroGPU does not scrape LinkedIn for you.
</Info>

## Step 3: Tag skills in a job post (NER)

When you need **categories** instead of fixed columns, set `metadata.usecase` to **`ner`** and pass `labels`:

<CodeGroup>
  ```bash cURL theme={null}
  curl --location 'https://api.zerogpu.ai/v1/responses' \
    --header 'content-type: application/json' \
    --header 'x-api-key: YOUR_API_KEY' \
    --header 'x-project-id: YOUR_PROJECT_ID' \
    --data '{
      "model": "gliner2-base-v1",
      "input": "We are hiring a Backend Engineer to design REST APIs in Python, deploy on AWS EKS, and maintain PostgreSQL databases.",
      "metadata": {
        "usecase": "ner",
        "labels": ["programming language", "database", "cloud platform", "framework"],
        "threshold": 0.3
      }
    }'
  ```

  ```python Python theme={null}
  # Use job_posts.jsonl record + same headers as Step 1
  payload = {
      "model": "gliner2-base-v1",
      "input": record["text"],
      "metadata": {
          "usecase": "ner",
          "labels": ["programming language", "database", "cloud platform", "certification"],
          "threshold": 0.3,
      },
  }
  ```
</CodeGroup>

Lower `threshold` to recall more spans; raise it when precision matters more.

## Step 4: Run the full example dataset

Batch all synthetic resumes or profiles locally:

```bash theme={null}
cd cookbook/demos/data-extraction/scripts
python3 -m venv .venv
.venv/bin/pip install -r requirements.txt
export ZEROGPU_API_KEY="zgpu-..."
export ZEROGPU_PROJECT_ID="your-project-uuid"
.venv/bin/python run_batch_extraction.py --dataset resumes --limit 3
.venv/bin/python run_batch_extraction.py --dataset profiles --limit 3
.venv/bin/python run_batch_extraction.py --dataset job_posts
```

Results are written to `demos/data-extraction/outputs/*.jsonl`. Inspect them before wiring into production ETL.

For many files at once, combine this with the [Batch API](/api-reference/create-batch) (parallelism, retries).

## Optional: PII on inbound text

Before storing user-submitted resumes, run PII extraction with the [gliner-multi-pii-v1](/api-reference/models/gliner-multi-pii-v1) model and `extract-pii` or `redact`.

## Production tips

* **Start small:** 5-8 schema fields beat 20 vague ones.
* **Golden set:** Keep 10-20 labeled examples from your real text shapes; re-run after schema changes.
* **Text only:** PDF and image pipelines are out of scope for this tutorial; convert to text upstream.
* **Monitor:** Use [Logs](/docs/platform#usage-and-logs) and [Usage](/docs/platform#usage-and-logs) in the dashboard.

## Go deeper

<CardGroup cols={2}>
  <Card title="gliner2-base-v1 model" icon="book" href="/api-reference/models/gliner2-base-v1">
    API reference and schema examples for json and NER extraction.
  </Card>

  <Card title="Example dataset (GitHub)" icon="github" href="https://github.com/zerogpu/cookbook/tree/main/demos/data-extraction/dataset">
    Synthetic resumes, profiles, and job posts (JSONL).
  </Card>

  <Card title="Companion article" icon="newspaper" href="https://github.com/zerogpu/cookbook/blob/main/demos/data-extraction/blog-post.md">
    Long-form walkthrough in the cookbook repo.
  </Card>

  <Card title="gliner2-base-v1 playground" icon="play" href="/api-reference/models/gliner2-base-v1">
    Try schemas interactively in the model catalog.
  </Card>
</CardGroup>
