sir-convert-a-lot-qwen-finetuning
>-
Install
mkdir -p .claude/skills/sir-convert-a-lot-qwen-finetuning && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/14778" && unzip -o skill.zip -d .claude/skills/sir-convert-a-lot-qwen-finetuning && rm skill.zipInstalls to .claude/skills/sir-convert-a-lot-qwen-finetuning
Activation
This is the description your AI agent reads to decide when to run this skill — the better it matches your request, the more reliably it fires.
Model-specific operator skill for Qwen3-TTS fine-tuning on Hemma and Colab. Use when the task is specifically about Qwen TTS training, Swedish language expansion with Qwen, Qwen preprocessing or runtime policy, or deciding whether a fine-tuned Qwen model should enter the Sir Convert-a-Lot sidecar candidate lane.About this skill
Sir Convert-a-Lot Qwen Finetuning
Use This Skill When
- The user wants to fine-tune
Qwen/Qwen3-TTS-12Hz-1.7B-Base. - The user wants Swedish support, not just a single custom voice.
- The task involves choosing between Hemma and Colab H100.
- The task involves ROCm, Triton flash attention, or GPU container policy.
- The task involves Swedish speech data curation, preprocessing, or evaluation.
- The task involves deciding whether a trained model is good enough to become a sidecar candidate later.
Do Not Use This Skill For
- Normal sidecar benchmarking that does not involve model training.
- Chatterbox, F5, OpenVoice, or MMS implementation work unless the task is explicitly comparing them against a future Qwen fine-tuned candidate.
- Generic speech-model training questions when no Qwen-specific decision is in
play. For those, use the broader
speech-model-finetuning-on-hemmaskill.
Source of Truth
Use this skill together with the broader local skill:
-
.codex/skills/speech-model-finetuning-on-hemma/SKILL.md -
.codex/skills/sir-convert-a-lot-colab-hemma/SKILL.md -
docs/runbooks/runbook-qwen3-swedish-finetuning-on-hemma-and-colab.md -
docs/runbooks/runbook-hemma-devops-and-gpu.md -
docs/backlog/epics/epic-08-qwen3-tts-swedish-language-expansion-fine-tuning-on-hemma-and-colab.md -
docs/backlog/stories/story-24-swedish-multi-speaker-corpus-preprocessing-and-evaluation-for-qwen3-tts.md -
docs/backlog/tasks/task-116-expand-rixvox-staging-and-run-a-sustained-detached-row-processing-window-for-the-bounded-hemma-pilot.md -
docs/backlog/stories/story-25-containerized-qwen3-tts-swedish-full-finetune-baseline-on-hemma-and-colab.md -
docs/backlog/tasks/task-141-define-frozen-qwen-pilot-dataset-use-for-finetuning.md -
docs/backlog/tasks/task-142-materialize-frozen-qwen-pilot-training-bundle-for-task-101.md -
docs/backlog/stories/story-32-consolidate-qwen-experiment-governance-and-surface-taxonomy.md -
.codex/rules/096-qwen-experiment-governance.md -
docs/decisions/0006-hemma-sidecar-tts-architecture-and-non-pdf-gpu-governance.md -
docs/decisions/0007-reusable-multi-backend-tts-sidecar-capability-contract.md
Upstream truth to verify before major claims or runtime changes:
First Move
Before proposing anything, classify the request into one of these lanes:
Benchmark lane- Serving/runtime truth only.
- Current repo home: Task 79 / Task 98.
Single-speaker adaptation lane- Useful for voice transfer experiments.
- Not the same as general Swedish support.
Language-expansion lane- Multi-speaker Swedish.
- Current repo home: Epic 08.
If the user says "general Swedish support," always choose lane 3 unless they explicitly narrow the scope.
Core Project Position
- The main project target is full fine-tuning of the
1.7Bbase model. - Hemma is viable for bounded pilot work.
- Colab H100 is the scale-up lane, not the only viable lane.
- The end goal is general Swedish support, not a single teacher voice.
- The first bounded Task 101 Hemma pilot must consume a deterministic training bundle projected from the frozen pilot root, not the generic promoted Task 103 preprocessing root.
- The canonical repo-owned materialization surface for that bundle is:
pdm run qwen-pilot-bundle build
- The detached Qwen pilot runtime must record both the train and held-out eval
manifest paths in launch/status/report metadata while staying explicit that
upstream
sft_12hz.pyis still train-only and does not perform in-training evaluation. - Scheduled Qwen pilot runs now use the canonical
500/100/3posture: durable checkpoint every500optimizer steps, held-out eval every100steps, retain newest3durable trainer-state checkpoints. - For older pre-schedule checkpoints, the canonical recovery order is: standalone held-out eval first, then resume only if the saved cursor is compatible with the current bundle contract.
- If a legacy launch requires
--pilot-bundle-rootoverride, do not assume the saved intra-epoch cursor is still meaningful; treat any impossible cursor as a fail-closed condition, not a warning. - If a bounded recovery probe already produced a newer durable checkpoint with a compatible cursor, prefer that newer checkpoint for the next strict resume rather than resetting to the older legacy step.
- If a preserved legacy launch still carries stale checkpoint cadence or
retention values, pass explicit resume overrides so the relaunched lane
truthfully matches the current
500/100/3scheduled posture. - If a resumed Task 101 lane fails with repeated non-finite behavior, do not
keep retrying blind full training runs. The canonical next step is:
status -> diagnose-non-finite -> fix -> bounded retry. - Story 32 is now the governing protocol for active Qwen Qwen experiment
work:
- classify every active run as
provenance,mechanism, orrecovery - keep one question per run
- record the full state vector in the Task 101 progress ledger before making causal claims
- use the promotion ladder: local gate -> short bounded fresh-start run -> longer governed proof
- classify every active run as
- Current active surface matrix:
qwen-historical-pilot-control:provenanceqwen-stability-lab:mechanism- governed
qwen-train launch/statusfresh-start proof lane:recovery, blocked until promotion qwen-freshstart-proofandqwen-backward-lineage:legacy-readonlyqwen-fallback-proofandqwen-fallback-accumulation-proof:deprecatedfor new work
- Current operator truth:
T221is now resolved as negative recreated-control evidence: the recreated original-recipe shape plus only theT206token-span fix still fails immediately under the current trainer/runtime- treat that as provenance evidence only, not as a mechanism or recovery answer
- Qwen stability lab remains the active mechanism lane
T225is complete as the exact parity contractT226is now complete as the committed local parity-probe surface:pdm run qwen-parity-probe run- the live in-image historical-bundle run under
task226-20260317t224307Zfound no meaningful checkpoint divergence between the current and intended paths T219is now recorded as negative bounded evidence undertask219-20260317t180700z-a1T228is now complete as the ranked closure of that familyT229is now complete as the narrowed rerun undertask229-20260318t064712z-a1- the target
sub_talker_lossfamily localizes totalker_core.layer_16.input_layernorm T230is now complete as the negative bounded normalization-entry rerun undertask230-20260318t082049z-a1T231is now complete as the explicit no-winner promotion decisionT232is now complete as the lane decision to stay in mechanismT233is now complete as the normalization-internal rerun undertask233-20260318t112544z-a1- the first verified internal surface is now
talker_core.layer_16.input_layernorm.output T234is now complete undertask234-20260318t123644z-a1- no variant stayed finite or earned promotion; the strongest
0p5member shifted the pair andline-13sub_talker_losscases totalker_core.layer_15.output, whileline-4still first broke attalker_core.layer_16.input_layernorm T235is now complete undertask235-20260318t140352z-a1- the mixed
sub_talker_lossresult is repeatable: pair andline-13stay attalker_core.layer_15.output, whileline-4stays attalker_core.layer_16.input_layernorm T236is now complete undertask236-20260318t145434z-a1- the outlier is a genuine row-local seam difference: pair and
line-13stay attalker_core.layer_15.output, whileline-4stays attalker_core.layer_16.input_layernorm.output T237is now complete undertask237-20260318t154708z-a1- the
1e3fp32-output-cap winner converged pair,line-13, andline-4sub_talker_losstotalker_core.layer_15.output T240is now complete undertask240-20260318t165458z-a1- all three normative
sub_talker_lossrows first broke attalker_core.layer_15.output, so the convergence class isconverged_layer15_output T241is now complete undertask241-20260318t175714z-a1- all three normative
sub_talker_lossrows still first broke attalker_core.layer_15.output, so the classification isconverged_layer15_output_residual T242is now complete as the permanent Hemma bind-root contract: the repo-rendered service is installed and active,statusnow proves the home roots are mounted onto the canonical/srv/scratchtrees, andprobeconfirms Docker must use/home/paunchygent/.data/sir-convert-a-lot/{build,cache}as the effective bind rootsT243is now complete undertask243-20260318t190832z-a1- all three normative
sub_talker_lossrows first broke attalker_core.layer_15.output, so the classification isconverged_layer15_output_return T244is now complete undertask244-20260318t193736z-a1- all three normative
sub_talker_lossrows still first broke attalker_core.layer_15.output, so the classification isconverged_output_return T245is now complete undertask245-20260318t202916z-a1- all three normative
sub_talker_lossrows still first broke attalker_core.layer_15.output, so the classification ismultiply_not_causal T246is now the immediate diagnosis-only mechanism sliceT246must split the fp32-scaled
Content truncated.