sir-convert-a-lot-qwen-finetuning

Name: sir-convert-a-lot-qwen-finetuning
Author: paunchygent

Install

mkdir -p .claude/skills/sir-convert-a-lot-qwen-finetuning && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/14778" && unzip -o skill.zip -d .claude/skills/sir-convert-a-lot-qwen-finetuning && rm skill.zip

Installs to .claude/skills/sir-convert-a-lot-qwen-finetuning

Activation

This is the description your AI agent reads to decide when to run this skill — the better it matches your request, the more reliably it fires.

Model-specific operator skill for Qwen3-TTS fine-tuning on Hemma and Colab. Use when the task is specifically about Qwen TTS training, Swedish language expansion with Qwen, Qwen preprocessing or runtime policy, or deciding whether a fine-tuned Qwen model should enter the Sir Convert-a-Lot sidecar candidate lane.

313 chars✓ has a “when” triggerlonger than Claude Code's old 250-char listing cap (fine on current versions)

About this skill

Sir Convert-a-Lot Qwen Finetuning

Use This Skill When

The user wants to fine-tune Qwen/Qwen3-TTS-12Hz-1.7B-Base.
The user wants Swedish support, not just a single custom voice.
The task involves choosing between Hemma and Colab H100.
The task involves ROCm, Triton flash attention, or GPU container policy.
The task involves Swedish speech data curation, preprocessing, or evaluation.
The task involves deciding whether a trained model is good enough to become a sidecar candidate later.

Do Not Use This Skill For

Normal sidecar benchmarking that does not involve model training.
Chatterbox, F5, OpenVoice, or MMS implementation work unless the task is explicitly comparing them against a future Qwen fine-tuned candidate.
Generic speech-model training questions when no Qwen-specific decision is in play. For those, use the broader speech-model-finetuning-on-hemma skill.

Source of Truth

Use this skill together with the broader local skill:

.codex/skills/speech-model-finetuning-on-hemma/SKILL.md
.codex/skills/sir-convert-a-lot-colab-hemma/SKILL.md
docs/runbooks/runbook-qwen3-swedish-finetuning-on-hemma-and-colab.md
docs/runbooks/runbook-hemma-devops-and-gpu.md
docs/backlog/epics/epic-08-qwen3-tts-swedish-language-expansion-fine-tuning-on-hemma-and-colab.md
docs/backlog/stories/story-24-swedish-multi-speaker-corpus-preprocessing-and-evaluation-for-qwen3-tts.md
docs/backlog/tasks/task-116-expand-rixvox-staging-and-run-a-sustained-detached-row-processing-window-for-the-bounded-hemma-pilot.md
docs/backlog/stories/story-25-containerized-qwen3-tts-swedish-full-finetune-baseline-on-hemma-and-colab.md
docs/backlog/tasks/task-141-define-frozen-qwen-pilot-dataset-use-for-finetuning.md
docs/backlog/tasks/task-142-materialize-frozen-qwen-pilot-training-bundle-for-task-101.md
docs/backlog/stories/story-32-consolidate-qwen-experiment-governance-and-surface-taxonomy.md
.codex/rules/096-qwen-experiment-governance.md
docs/decisions/0006-hemma-sidecar-tts-architecture-and-non-pdf-gpu-governance.md
docs/decisions/0007-reusable-multi-backend-tts-sidecar-capability-contract.md

Upstream truth to verify before major claims or runtime changes:

First Move

Before proposing anything, classify the request into one of these lanes:

Benchmark lane
- Serving/runtime truth only.
- Current repo home: Task 79 / Task 98.
Single-speaker adaptation lane
- Useful for voice transfer experiments.
- Not the same as general Swedish support.
Language-expansion lane
- Multi-speaker Swedish.
- Current repo home: Epic 08.

If the user says "general Swedish support," always choose lane 3 unless they explicitly narrow the scope.

Core Project Position

The main project target is full fine-tuning of the 1.7B base model.
Hemma is viable for bounded pilot work.
Colab H100 is the scale-up lane, not the only viable lane.
The end goal is general Swedish support, not a single teacher voice.
The first bounded Task 101 Hemma pilot must consume a deterministic training bundle projected from the frozen pilot root, not the generic promoted Task 103 preprocessing root.
The canonical repo-owned materialization surface for that bundle is:
- pdm run qwen-pilot-bundle build
The detached Qwen pilot runtime must record both the train and held-out eval manifest paths in launch/status/report metadata while staying explicit that upstream sft_12hz.py is still train-only and does not perform in-training evaluation.
Scheduled Qwen pilot runs now use the canonical 500/100/3 posture: durable checkpoint every 500 optimizer steps, held-out eval every 100 steps, retain newest 3 durable trainer-state checkpoints.
For older pre-schedule checkpoints, the canonical recovery order is: standalone held-out eval first, then resume only if the saved cursor is compatible with the current bundle contract.
If a legacy launch requires --pilot-bundle-root override, do not assume the saved intra-epoch cursor is still meaningful; treat any impossible cursor as a fail-closed condition, not a warning.
If a bounded recovery probe already produced a newer durable checkpoint with a compatible cursor, prefer that newer checkpoint for the next strict resume rather than resetting to the older legacy step.
If a preserved legacy launch still carries stale checkpoint cadence or retention values, pass explicit resume overrides so the relaunched lane truthfully matches the current 500/100/3 scheduled posture.
If a resumed Task 101 lane fails with repeated non-finite behavior, do not keep retrying blind full training runs. The canonical next step is: status -> diagnose-non-finite -> fix -> bounded retry.
Story 32 is now the governing protocol for active Qwen Qwen experiment work:
- classify every active run as provenance, mechanism, or recovery
- keep one question per run
- record the full state vector in the Task 101 progress ledger before making causal claims
- use the promotion ladder: local gate -> short bounded fresh-start run -> longer governed proof
Current active surface matrix:
- qwen-historical-pilot-control: provenance
- qwen-stability-lab: mechanism
- governed qwen-train launch/status fresh-start proof lane: recovery, blocked until promotion
- qwen-freshstart-proof and qwen-backward-lineage: legacy-readonly
- qwen-fallback-proof and qwen-fallback-accumulation-proof: deprecated for new work
Current operator truth:
- T221 is now resolved as negative recreated-control evidence: the recreated original-recipe shape plus only the T206 token-span fix still fails immediately under the current trainer/runtime
- treat that as provenance evidence only, not as a mechanism or recovery answer
- Qwen stability lab remains the active mechanism lane
- T225 is complete as the exact parity contract
- T226 is now complete as the committed local parity-probe surface: pdm run qwen-parity-probe run
- the live in-image historical-bundle run under task226-20260317t224307Z found no meaningful checkpoint divergence between the current and intended paths
- T219 is now recorded as negative bounded evidence under task219-20260317t180700z-a1
- T228 is now complete as the ranked closure of that family
- T229 is now complete as the narrowed rerun under task229-20260318t064712z-a1
- the target sub_talker_loss family localizes to talker_core.layer_16.input_layernorm
- T230 is now complete as the negative bounded normalization-entry rerun under task230-20260318t082049z-a1
- T231 is now complete as the explicit no-winner promotion decision
- T232 is now complete as the lane decision to stay in mechanism
- T233 is now complete as the normalization-internal rerun under task233-20260318t112544z-a1
- the first verified internal surface is now talker_core.layer_16.input_layernorm.output
- T234 is now complete under task234-20260318t123644z-a1
- no variant stayed finite or earned promotion; the strongest 0p5 member shifted the pair and line-13 sub_talker_loss cases to talker_core.layer_15.output, while line-4 still first broke at talker_core.layer_16.input_layernorm
- T235 is now complete under task235-20260318t140352z-a1
- the mixed sub_talker_loss result is repeatable: pair and line-13 stay at talker_core.layer_15.output, while line-4 stays at talker_core.layer_16.input_layernorm
- T236 is now complete under task236-20260318t145434z-a1
- the outlier is a genuine row-local seam difference: pair and line-13 stay at talker_core.layer_15.output, while line-4 stays at talker_core.layer_16.input_layernorm.output
- T237 is now complete under task237-20260318t154708z-a1
- the 1e3 fp32-output-cap winner converged pair, line-13, and line-4 sub_talker_loss to talker_core.layer_15.output
- T240 is now complete under task240-20260318t165458z-a1
- all three normative sub_talker_loss rows first broke at talker_core.layer_15.output, so the convergence class is converged_layer15_output
- T241 is now complete under task241-20260318t175714z-a1
- all three normative sub_talker_loss rows still first broke at talker_core.layer_15.output, so the classification is converged_layer15_output_residual
- T242 is now complete as the permanent Hemma bind-root contract: the repo-rendered service is installed and active, status now proves the home roots are mounted onto the canonical /srv/scratch trees, and probe confirms Docker must use /home/paunchygent/.data/sir-convert-a-lot/{build,cache} as the effective bind roots
- T243 is now complete under task243-20260318t190832z-a1
- all three normative sub_talker_loss rows first broke at talker_core.layer_15.output, so the classification is converged_layer15_output_return
- T244 is now complete under task244-20260318t193736z-a1
- all three normative sub_talker_loss rows still first broke at talker_core.layer_15.output, so the classification is converged_output_return
- T245 is now complete under task245-20260318t202916z-a1
- all three normative sub_talker_loss rows still first broke at talker_core.layer_15.output, so the classification is multiply_not_causal
- T246 is now the immediate diagnosis-only mechanism slice
- T246 must split the fp32-scaled

Content truncated.

Install

mkdir -p .claude/skills/sir-convert-a-lot-qwen-finetuning && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/14778" && unzip -o skill.zip -d .claude/skills/sir-convert-a-lot-qwen-finetuning && rm skill.zip

Installs to .claude/skills/sir-convert-a-lot-qwen-finetuning

Safety

No risk patterns found

Automated static scan of the SKILL.md and repo. A flag describes what the skill can do — not a verdict. Always review code before installing.

Source & maintenance

Updated

23d ago

Repo stars

Loads

~4,664 tokens

Stars are for the whole repository, not this skill alone.

Stats

Views

Installs

Author

paunchygent

Links

Source code

sir-convert-a-lot-qwen-finetuning

Install

Activation

About this skill

Sir Convert-a-Lot Qwen Finetuning

Use This Skill When

Do Not Use This Skill For

Source of Truth

First Move

Core Project Position

Search skills