agentskills.codes
SI

sir-convert-a-lot-qwen-finetuning

>-

Install

mkdir -p .claude/skills/sir-convert-a-lot-qwen-finetuning && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/14778" && unzip -o skill.zip -d .claude/skills/sir-convert-a-lot-qwen-finetuning && rm skill.zip

Installs to .claude/skills/sir-convert-a-lot-qwen-finetuning

Activation

This is the description your AI agent reads to decide when to run this skill — the better it matches your request, the more reliably it fires.

Model-specific operator skill for Qwen3-TTS fine-tuning on Hemma and Colab. Use when the task is specifically about Qwen TTS training, Swedish language expansion with Qwen, Qwen preprocessing or runtime policy, or deciding whether a fine-tuned Qwen model should enter the Sir Convert-a-Lot sidecar candidate lane.
313 chars✓ has a “when” triggerlonger than Claude Code's old 250-char listing cap (fine on current versions)

About this skill

Sir Convert-a-Lot Qwen Finetuning

Use This Skill When

  • The user wants to fine-tune Qwen/Qwen3-TTS-12Hz-1.7B-Base.
  • The user wants Swedish support, not just a single custom voice.
  • The task involves choosing between Hemma and Colab H100.
  • The task involves ROCm, Triton flash attention, or GPU container policy.
  • The task involves Swedish speech data curation, preprocessing, or evaluation.
  • The task involves deciding whether a trained model is good enough to become a sidecar candidate later.

Do Not Use This Skill For

  • Normal sidecar benchmarking that does not involve model training.
  • Chatterbox, F5, OpenVoice, or MMS implementation work unless the task is explicitly comparing them against a future Qwen fine-tuned candidate.
  • Generic speech-model training questions when no Qwen-specific decision is in play. For those, use the broader speech-model-finetuning-on-hemma skill.

Source of Truth

Use this skill together with the broader local skill:

  • .codex/skills/speech-model-finetuning-on-hemma/SKILL.md

  • .codex/skills/sir-convert-a-lot-colab-hemma/SKILL.md

  • docs/runbooks/runbook-qwen3-swedish-finetuning-on-hemma-and-colab.md

  • docs/runbooks/runbook-hemma-devops-and-gpu.md

  • docs/backlog/epics/epic-08-qwen3-tts-swedish-language-expansion-fine-tuning-on-hemma-and-colab.md

  • docs/backlog/stories/story-24-swedish-multi-speaker-corpus-preprocessing-and-evaluation-for-qwen3-tts.md

  • docs/backlog/tasks/task-116-expand-rixvox-staging-and-run-a-sustained-detached-row-processing-window-for-the-bounded-hemma-pilot.md

  • docs/backlog/stories/story-25-containerized-qwen3-tts-swedish-full-finetune-baseline-on-hemma-and-colab.md

  • docs/backlog/tasks/task-141-define-frozen-qwen-pilot-dataset-use-for-finetuning.md

  • docs/backlog/tasks/task-142-materialize-frozen-qwen-pilot-training-bundle-for-task-101.md

  • docs/backlog/stories/story-32-consolidate-qwen-experiment-governance-and-surface-taxonomy.md

  • .codex/rules/096-qwen-experiment-governance.md

  • docs/decisions/0006-hemma-sidecar-tts-architecture-and-non-pdf-gpu-governance.md

  • docs/decisions/0007-reusable-multi-backend-tts-sidecar-capability-contract.md

Upstream truth to verify before major claims or runtime changes:

First Move

Before proposing anything, classify the request into one of these lanes:

  1. Benchmark lane
    • Serving/runtime truth only.
    • Current repo home: Task 79 / Task 98.
  2. Single-speaker adaptation lane
    • Useful for voice transfer experiments.
    • Not the same as general Swedish support.
  3. Language-expansion lane
    • Multi-speaker Swedish.
    • Current repo home: Epic 08.

If the user says "general Swedish support," always choose lane 3 unless they explicitly narrow the scope.

Core Project Position

  • The main project target is full fine-tuning of the 1.7B base model.
  • Hemma is viable for bounded pilot work.
  • Colab H100 is the scale-up lane, not the only viable lane.
  • The end goal is general Swedish support, not a single teacher voice.
  • The first bounded Task 101 Hemma pilot must consume a deterministic training bundle projected from the frozen pilot root, not the generic promoted Task 103 preprocessing root.
  • The canonical repo-owned materialization surface for that bundle is:
    • pdm run qwen-pilot-bundle build
  • The detached Qwen pilot runtime must record both the train and held-out eval manifest paths in launch/status/report metadata while staying explicit that upstream sft_12hz.py is still train-only and does not perform in-training evaluation.
  • Scheduled Qwen pilot runs now use the canonical 500/100/3 posture: durable checkpoint every 500 optimizer steps, held-out eval every 100 steps, retain newest 3 durable trainer-state checkpoints.
  • For older pre-schedule checkpoints, the canonical recovery order is: standalone held-out eval first, then resume only if the saved cursor is compatible with the current bundle contract.
  • If a legacy launch requires --pilot-bundle-root override, do not assume the saved intra-epoch cursor is still meaningful; treat any impossible cursor as a fail-closed condition, not a warning.
  • If a bounded recovery probe already produced a newer durable checkpoint with a compatible cursor, prefer that newer checkpoint for the next strict resume rather than resetting to the older legacy step.
  • If a preserved legacy launch still carries stale checkpoint cadence or retention values, pass explicit resume overrides so the relaunched lane truthfully matches the current 500/100/3 scheduled posture.
  • If a resumed Task 101 lane fails with repeated non-finite behavior, do not keep retrying blind full training runs. The canonical next step is: status -> diagnose-non-finite -> fix -> bounded retry.
  • Story 32 is now the governing protocol for active Qwen Qwen experiment work:
    • classify every active run as provenance, mechanism, or recovery
    • keep one question per run
    • record the full state vector in the Task 101 progress ledger before making causal claims
    • use the promotion ladder: local gate -> short bounded fresh-start run -> longer governed proof
  • Current active surface matrix:
    • qwen-historical-pilot-control: provenance
    • qwen-stability-lab: mechanism
    • governed qwen-train launch/status fresh-start proof lane: recovery, blocked until promotion
    • qwen-freshstart-proof and qwen-backward-lineage: legacy-readonly
    • qwen-fallback-proof and qwen-fallback-accumulation-proof: deprecated for new work
  • Current operator truth:
    • T221 is now resolved as negative recreated-control evidence: the recreated original-recipe shape plus only the T206 token-span fix still fails immediately under the current trainer/runtime
    • treat that as provenance evidence only, not as a mechanism or recovery answer
    • Qwen stability lab remains the active mechanism lane
    • T225 is complete as the exact parity contract
    • T226 is now complete as the committed local parity-probe surface: pdm run qwen-parity-probe run
    • the live in-image historical-bundle run under task226-20260317t224307Z found no meaningful checkpoint divergence between the current and intended paths
    • T219 is now recorded as negative bounded evidence under task219-20260317t180700z-a1
    • T228 is now complete as the ranked closure of that family
    • T229 is now complete as the narrowed rerun under task229-20260318t064712z-a1
    • the target sub_talker_loss family localizes to talker_core.layer_16.input_layernorm
    • T230 is now complete as the negative bounded normalization-entry rerun under task230-20260318t082049z-a1
    • T231 is now complete as the explicit no-winner promotion decision
    • T232 is now complete as the lane decision to stay in mechanism
    • T233 is now complete as the normalization-internal rerun under task233-20260318t112544z-a1
    • the first verified internal surface is now talker_core.layer_16.input_layernorm.output
    • T234 is now complete under task234-20260318t123644z-a1
    • no variant stayed finite or earned promotion; the strongest 0p5 member shifted the pair and line-13 sub_talker_loss cases to talker_core.layer_15.output, while line-4 still first broke at talker_core.layer_16.input_layernorm
    • T235 is now complete under task235-20260318t140352z-a1
    • the mixed sub_talker_loss result is repeatable: pair and line-13 stay at talker_core.layer_15.output, while line-4 stays at talker_core.layer_16.input_layernorm
    • T236 is now complete under task236-20260318t145434z-a1
    • the outlier is a genuine row-local seam difference: pair and line-13 stay at talker_core.layer_15.output, while line-4 stays at talker_core.layer_16.input_layernorm.output
    • T237 is now complete under task237-20260318t154708z-a1
    • the 1e3 fp32-output-cap winner converged pair, line-13, and line-4 sub_talker_loss to talker_core.layer_15.output
    • T240 is now complete under task240-20260318t165458z-a1
    • all three normative sub_talker_loss rows first broke at talker_core.layer_15.output, so the convergence class is converged_layer15_output
    • T241 is now complete under task241-20260318t175714z-a1
    • all three normative sub_talker_loss rows still first broke at talker_core.layer_15.output, so the classification is converged_layer15_output_residual
    • T242 is now complete as the permanent Hemma bind-root contract: the repo-rendered service is installed and active, status now proves the home roots are mounted onto the canonical /srv/scratch trees, and probe confirms Docker must use /home/paunchygent/.data/sir-convert-a-lot/{build,cache} as the effective bind roots
    • T243 is now complete under task243-20260318t190832z-a1
    • all three normative sub_talker_loss rows first broke at talker_core.layer_15.output, so the classification is converged_layer15_output_return
    • T244 is now complete under task244-20260318t193736z-a1
    • all three normative sub_talker_loss rows still first broke at talker_core.layer_15.output, so the classification is converged_output_return
    • T245 is now complete under task245-20260318t202916z-a1
    • all three normative sub_talker_loss rows still first broke at talker_core.layer_15.output, so the classification is multiply_not_causal
    • T246 is now the immediate diagnosis-only mechanism slice
    • T246 must split the fp32-scaled

Content truncated.

Search skills

Search the agent skills registry