synthesize-data

Name: synthesize-data
Author: surus-lat

Generate synthetic benchmark examples from a task spec. Invoked from setup-data when there is no data, or directly when the user asks to generate examples. Produces a JSONL file in .data/<benchmark-name>/.

Install

mkdir -p .claude/skills/synthesize-data && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/14179" && unzip -o skill.zip -d .claude/skills/synthesize-data && rm skill.zip

Installs to .claude/skills/synthesize-data

Activation

This is the description your AI agent reads to decide when to run this skill — the better it matches your request, the more reliably it fires.

Generate synthetic benchmark examples from a task spec. Invoked from setup-data when there is no data, or directly when the user asks to generate examples. Produces a JSONL file in .data/<benchmark-name>/.

205 chars✓ has a “when” trigger

About this skill

Synthesize Data Skill

Generates synthetic {text, expected} pairs from the benchmark spec's task spec.

Can be invoked from setup-data (when Path 3 is selected) or directly:

"Generate 30 examples for my invoice benchmark"

Hard Rule: Image Tasks

If task.input.type is image, document, or pdf — stop immediately:

"I can generate expected outputs but not the images themselves. You need real images to run this benchmark. Use setup-data (Path 1 or 2) to supply your own images."

Do not attempt generation. Return to the user.

Steps

Read the benchmark spec (must already have a valid task: section)
Check input type — block image tasks (see above)
Run the data generator:

from src.data_generator import generate_data

path = generate_data("<path-to-spec>", count=30)
# → .data/<benchmark_name>/train.jsonl

Or via CLI (if running as a subprocess):

python -c "from src.data_generator import generate_data; generate_data('<path-to-spec>', count=30)"

Update the benchmark spec's data: section to point to the generated file:

data:
  source: local
  path: .data/<benchmark-name>/train.jsonl

Confirm success:

Generated 30 examples → .data/my-benchmark/train.jsonl

What the Generator Does

Reads task.type, task.input.description, task.output.fields from the benchmark spec
Builds a generation prompt from the spec
Calls the generator model N times in parallel (async)
Validates each result against the field schema
Saves to .data/<benchmark-name>/train.jsonl

The generator model is an internal engine detail — never mention it to the user.

Counting

Default count: 30
User can specify: "Generate 50 examples"
Minimum useful: 10 (fewer makes metrics unreliable)

Output Artifact

.data/<benchmark-name>/train.jsonl — one JSON object per line:

{"id": "0", "text": "Invoice from ...", "expected": {"vendor_name": "ACME Corp", "amount": 1250.00}}
{"id": "1", "text": "Factura de ...", "expected": {"vendor_name": "Distribuidora XYZ", "amount": 875.50}}

After Generation

Tell the user:

"Generated {N} examples → .data/{benchmark-name}/train.jsonl. Your the benchmark spec has been updated. Run run-benchmark to evaluate."

If count < 10, warn:

"Note: {N} examples is a small dataset — results may not be statistically reliable. Consider generating more."

More by surus-lat

View all by surus-lat →

evaluate

surus-lat

Run benchy evaluations against models or systems. Covers the canonical smoke→full workflow, config selection, task filtering, exit policies, and reading run_outcome.json. Use when asked to evaluate, benchmark, or run benchy against a model or system config.

rebuild-leaderboard

surus-lat

Re-run all models on the LatamBoard leaderboard from scratch after data loss. Identifies which model configs exist, runs each one with the full latam_board task suite on the cluster, and publishes results to HuggingFace after each model so progress is never lost.

Install

mkdir -p .claude/skills/synthesize-data && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/14179" && unzip -o skill.zip -d .claude/skills/synthesize-data && rm skill.zip

Installs to .claude/skills/synthesize-data

Safety

Review before install

Runs shell / code

Automated static scan of the SKILL.md and repo. A flag describes what the skill can do — not a verdict. Always review code before installing.

Source & maintenance

Updated

13d ago

Repo stars

Loads

~615 tokens

Stars are for the whole repository, not this skill alone.

Stats

Views

Installs

Author

surus-lat

3 skills published

Links

Source code

synthesize-data

Install

Activation

About this skill

Synthesize Data Skill

Hard Rule: Image Tasks

Steps

What the Generator Does

Counting

Output Artifact

After Generation

More by surus-lat

evaluate

rebuild-leaderboard

Search skills