sue-update-lesson

Name: sue-update-lesson
Author: dongzhuoyao

Use whenever a durable SUE / scale-up lesson is learned in any sue-*

Install

mkdir -p .claude/skills/sue-update-lesson && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/13522" && unzip -o skill.zip -d .claude/skills/sue-update-lesson && rm skill.zip

Installs to .claude/skills/sue-update-lesson

Activation

This is the description your AI agent reads to decide when to run this skill — the better it matches your request, the more reliably it fires.

Use whenever a durable SUE / scale-up lesson is learned in any sue-*

68 charsno explicit “when” trigger

About this skill

Sue Update Lesson

Overview

Append durable SUE lessons to the correct L1 or L2 SCALE_UP.md file after a workflow issue has been solved and encoded in durable behavior.

DeepResearch Root Contract (HIGHEST PRIORITY)

Before any SUE action in this skill, verify the workspace output-root contract. If any check fails, STOP and abort; do not proceed with discovery, planning, submission, monitoring, cleanup, reporting, or lessons.

The workspace path must be $<SANDBOX>_DEEPRESEARCH_ROOT/workspace/<workspace>/.
All scale-up outputs must be written to $<SANDBOX>_DEEPRESEARCH_ROOT/workspace/<workspace>/scale_up_outputs/ or a subpath declared in runtime.yaml under that root.
$<SANDBOX>_DEEPRESEARCH_ROOT must be resolvable via sue-context (or the selected sandbox's private config for remote backends). Do not fall back to $HOME, arbitrary scratch, or the source tree.

This contract takes precedence over sandbox selection, GPU accounting, manifest

Hard Sandbox Repo Root Gate

Before any remote query, sync, cleanup, dryrun/fullrun submission, monitor, quota check, or output-path mutation, verify the selected backend exports a non-empty <SANDBOX>_DEEPRESEARCH_ROOT — one of LUMI_DEEPRESEARCH_ROOT, NM5_DEEPRESEARCH_ROOT, SNELLIUS_DEEPRESEARCH_ROOT, BREV_DEEPRESEARCH_ROOT, AUTODL_DEEPRESEARCH_ROOT, or RUNPOD_DEEPRESEARCH_ROOT — and that the directory exists. If the root is unset, empty, missing, or only inferable from cwd, scratch/project roots, PROJECT_ROOT, NM5_PROJECT_ROOT, workspace name, or marker walking, terminate immediately with a blocker naming the missing key. Do not derive a replacement root, do not continue with a guessed checkout, and do not run destructive or quota-consuming commands. validation, and all downstream steps.

Sandbox Communication

When this workflow communicates with a remote sandbox, keep SSH/API calls as control-plane actions: launch, inspect, fetch logs, or stop work. If a remote command is likely to run long, stream substantial output, or require repeated polling, start it inside a detached tmux session on the sandbox and return after verifying the session name and durable log path. Do not keep a local SSH connection open as the job supervisor.

Use a stable session name and a log under the workspace's configured logs_root or scale_up_outputs/logs/. Prefer tmux new-session -d -s <name> 'bash <script> 2>&1 | tee -a <log>'; avoid tmux send-keys. On Slurm-capable sandboxes, submit scheduler jobs normally and use tmux only for long login-node orchestration or monitor loops. On direct-run sandboxes such as Brev, AutoDL, or RunPod, use tmux for long-running remote commands unless the platform provides an equivalent detached process supervisor. If tmux is unavailable on the sandbox, stop and report the exact missing tool instead of silently keeping the connection open.

Record a durable SUE / scale-up lesson so the same mistake is not repeated.

When to Use

Invoke this skill at the end of any sue-* workflow when:

An unexpected issue was solved.
A sandbox-specific quirk was discovered.
- If the quirk changes a backend's reusable access, preflight, launch, or lifecycle rules, treat it as a sandbox-specific lesson and update the matching deepresearch-sandbox/<SANDBOX>_SANDBOX.md file.
- If it is a one-off or project-specific workaround, record it in the workspace SCALE_UP.md instead.
A session-specific check or command turned out to be required.
The user says "remember this", "add this to SCALE_UP.md", "record the lesson", "update sandbox memory", or any equivalent phrase.

This skill is the canonical way to persist lessons across workspace/<workspace_name> projects.

Before debugging any SUE / scale-up issue from scratch, first check the root SCALE_UP.md, the active workspace's SCALE_UP.md, and the active backend's deepresearch-sandbox/<SANDBOX>_SANDBOX.md for an existing solution in the relevant session or section. Apply any matching lesson directly. If the issue is new or an existing lesson needs strengthening, use this skill to update the appropriate memory file, then update the affected .codex/skills/sue-*/SKILL.md when workflow behavior changed.

No Interview

This skill is auto-invoked. Resolve every input from the calling context and the workspace's authoritative scale_up_outputs/<exp_dir>/config/runtime.yaml when it exists. If scale_up_outputs/<exp_dir>/config/runtime.yaml and scale_up_outputs/<exp_dir>/config/scale-up.yaml are missing, stale, or disagree on backend or environment policy, stop and report the contradiction before recording a lesson.

Per-run artifacts and final_result bundles belong under the per-experiment output directory (paths.exp_dir / SUE_EXP_DIR), not directly under a loose <scale_up_outputs_root>/<run_id>/ tree.

Workspace path — from the caller's resolved workspace.
Sandbox — from runtime.yaml.backend or the caller's sandbox context.
Session — from the caller's workflow phase (e.g., dryrun, fullrun).
Scope — inferred from whether the lesson depends on project-specific code, data, or config.
Lesson content — trigger, wrong behavior, correct behavior, from the caller's failure/fix summary.
Source — caller workspace + current date (redact private values).

If a required value cannot be resolved from context, stop and report the exact missing field; do not interview the user.

Inputs

Workspace path — caller's resolved workspace.
Sandbox — LUMI, Snellius, RunPod, Brev, AutoDL, local, or any.
Session — one of: env_prep, dataset_prep, interface_check, scripts_writing, dryrun, fullrun, monitor, summarize, cleanup, diagnose, reset, audit.
Scope — is the lesson:
- Generic: applies to multiple workspaces (e.g., a LUMI container trap).
- Workspace-specific: tied to this project's data, code paths, or config (e.g., a CO3D LMDB quirk in loop-vggt).
- Sandbox-specific: changes the reusable access, preflight, launch, or lifecycle rules for a specific backend (e.g., a new LUMI MIOPEN cache requirement). Sandbox-specific lessons go to deepresearch-sandbox/<SANDBOX>_SANDBOX.md.
Lesson content — trigger, wrong behavior, correct behavior.
Source — workspace name and date (redact private values).

Resolve DeepResearch repo context

Invoke sue-context to discover the deepresearch repo root and load memory/project.md, AGENTS.md, memory/sue/SCALE_UP.md, config/codex_sync.json, config/sue-templates/runtime.yaml, and .codex/skills/AGENTS.md. Do not proceed until the repo root is resolved.

Targets

scope	target file
generic	`<repo>/memory/sue/SCALE_UP.md`
workspace-specific	`<workspace_dir>/SCALE_UP.md`
sandbox-specific	`<repo>/deepresearch-sandbox/<SANDBOX>_SANDBOX.md`
sandbox-convention	`<repo>/deepresearch-sandbox/README.md` (only when the lesson is a cross-backend convention)

Workflow

Resolve the target path via sue-context.
- For generic lessons: <repo>/memory/sue/SCALE_UP.md. If that file is missing, fall back to <repo>/../deepresearch-workspace/SCALE_UP.md for backward compatibility.
- For workspace-specific lessons: <workspace_dir>/SCALE_UP.md. If the file does not exist, run sue-init first to create it with the standard session sections.
- For sandbox-specific lessons: <repo>/deepresearch-sandbox/<SANDBOX>_SANDBOX.md. Normalize the backend name to uppercase. If the file does not exist, stop and report the missing path.
- For cross-backend conventions: <repo>/deepresearch-sandbox/README.md.
Read the target file. Check whether an identical or equivalent lesson already exists. If it does, do not duplicate it; instead, strengthen the existing entry if the new evidence adds detail.
Ensure the right section exists.
- For SCALE_UP.md targets: if ## <Session> is missing, add it.
- For sandbox targets: find the matching rule section (e.g., ## Launch Rules, ## Preflight, ## Storage, ## Lifecycle Rules). If no section matches, add a new one at the same level as the existing sections.
Append the lesson using the format below. Use a short, stable title.
- For sandbox targets, keep the entry reusable: use placeholders such as <account>, <scratch-root>, or <host> and put concrete values in the ignored deepresearch-sandbox/config_<sandbox>.txt.
If the lesson changes a skill's workflow, update the affected .codex/skills/sue-*/SKILL.md in the same turn (Codex-first; do not edit .claude/ mirrors directly).
Report the result. State which file was updated and the lesson title.

Lesson Format

### <Short lesson title>
- **sandbox**: LUMI | Snellius | RunPod | Brev | AutoDL | any
- **session**: env_prep | dataset_prep | interface_check | scripts_writing |
  dryrun | fullrun | monitor | summarize | cleanup | diagnose | reset | audit
- **date**: YYYY-MM-DD
- **trigger**: What symptom or question revealed the issue?
- **wrong**: The mistake the agent made or almost made.
- **correct**: The required behavior, check, or command.
- **source**: workspace/<name> or session (private values redacted)

Output Template

Recorded SUE lesson:
  scope:       generic | workspace-specific (<workspace>) | sandbox (<sandbox>) | sandbox-convention
  target:      <path>
  sandbox:     <sandbox>
  session:     <session>
  title:       <title>
  skill updated: <sue-*/SKILL.md> | none

Anti-Patterns

Do not record one-off accidents or environment-specific transient failures.
Do not duplicate an existing lesson.
Do not commit raw transcripts, secrets, project IDs, hostnames, or tokens.
Do not put concrete private values i

Content truncated.

More by dongzhuoyao

View all by dongzhuoyao →

tao-rebuttal-strategist

dongzhuoyao

Rebuttal strategy planning

agent-job-info

dongzhuoyao

Summarize DeepResearch experiment job status across scheduler/process

sue-runpod-cleanup

dongzhuoyao

sue-runpod-cleanup: Clean stale files under /workspace/ on a RunPod

sue-normalize-structure

dongzhuoyao

sue-normalize-structure: Use when a DeepResearch workspace or codebase\

Install

mkdir -p .claude/skills/sue-update-lesson && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/13522" && unzip -o skill.zip -d .claude/skills/sue-update-lesson && rm skill.zip

Installs to .claude/skills/sue-update-lesson

Safety

No risk patterns found

Automated static scan of the SKILL.md and repo. A flag describes what the skill can do — not a verdict. Always review code before installing.

Source & maintenance

Updated

1d ago

Repo stars

Loads

~2,572 tokens

Stars are for the whole repository, not this skill alone.

Stats

Views

Installs

Author

dongzhuoyao

5 skills published

Links

Source code

sue-update-lesson

Install

Activation

About this skill

Sue Update Lesson

Overview

DeepResearch Root Contract (HIGHEST PRIORITY)

Hard Sandbox Repo Root Gate

Sandbox Communication

When to Use

No Interview

Inputs

Resolve DeepResearch repo context

Targets

Workflow

Lesson Format

Output Template

Anti-Patterns

More by dongzhuoyao

tao-rebuttal-strategist

agent-job-info

sue-runpod-cleanup

sue-normalize-structure

Search skills