sue-runpod-cleanup
sue-runpod-cleanup: Clean stale files under /workspace/ on a RunPod
Install
mkdir -p .claude/skills/sue-runpod-cleanup && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/13497" && unzip -o skill.zip -d .claude/skills/sue-runpod-cleanup && rm skill.zipInstalls to .claude/skills/sue-runpod-cleanup
Activation
This is the description your AI agent reads to decide when to run this skill — the better it matches your request, the more reliably it fires.
sue-runpod-cleanup: Clean stale files under /workspace/ on a RunPodAbout this skill
Sue RunPod Cleanup
Workflow
Resolve the RunPod target, verify /workspace, scan the top 5 space-consuming
directories, then install or run automated cleanup for files not accessed in 2
days. Use pod-side tmux for long scans and cleanup commands, and report exact
log paths.
DeepResearch Root Contract (HIGHEST PRIORITY)
Before any SUE action in this skill, verify the workspace output-root contract. If any check fails, STOP and abort; do not proceed with discovery, planning, execution, monitoring, or lessons.
- The workspace path must be
$<SANDBOX>_DEEPRESEARCH_ROOT/workspace/<workspace>/. - All scale-up outputs must be written to
$<SANDBOX>_DEEPRESEARCH_ROOT/workspace/<workspace>/scale_up_outputs/or a subpath declared inruntime.yamlunder that root. $<SANDBOX>_DEEPRESEARCH_ROOTmust be resolvable viasue-context(or the selected sandbox's private config for remote backends). Do not fall back to$HOME, arbitrary scratch, or the source tree.
This contract takes precedence over sandbox selection, GPU accounting, manifest validation, and all downstream steps.
Hard Sandbox Repo Root Gate
Before any remote query, sync, cleanup, dryrun/fullrun submission, monitor, quota check, or output-path mutation, verify the selected backend exports a non-empty <SANDBOX>_DEEPRESEARCH_ROOT — one of LUMI_DEEPRESEARCH_ROOT, NM5_DEEPRESEARCH_ROOT, SNELLIUS_DEEPRESEARCH_ROOT, BREV_DEEPRESEARCH_ROOT, AUTODL_DEEPRESEARCH_ROOT, or RUNPOD_DEEPRESEARCH_ROOT — and that the directory exists. If the root is unset, empty, missing, or only inferable from cwd, scratch/project roots, PROJECT_ROOT, NM5_PROJECT_ROOT, workspace name, or marker walking, terminate immediately with a blocker naming the missing key. Do not derive a replacement root, do not continue with a guessed checkout, and do not run destructive or quota-consuming commands.
Sandbox Communication
When this workflow communicates with a remote sandbox, keep SSH/API calls as
control-plane actions: launch, inspect, fetch logs, or stop work. If a remote
command is likely to run long, stream substantial output, or require repeated
polling, start it inside a detached tmux session on the sandbox and return
after verifying the session name and durable log path. Do not keep a local SSH
connection open as the job supervisor.
Use a stable session name and a log under the workspace's configured
logs_root or scale_up_outputs/logs/. Prefer
tmux new-session -d -s <name> 'bash <script> 2>&1 | tee -a <log>'; avoid
tmux send-keys. On direct-run sandboxes such as Brev, AutoDL, or RunPod, use
tmux for long-running remote commands unless the platform provides an equivalent
detached process supervisor. If tmux is unavailable on the sandbox, stop and
report the exact missing tool instead of silently keeping the connection open.
Purpose
Keep RunPod network SSD use small and inexpensive by continuously cleaning stale
data under /workspace on a RunPod pod. "Stale" means not accessed (read) in 2
or more days. This skill deletes everything under /workspace that meets
the age threshold — including experiment outputs, checkpoints, logs, and cached
data. It does not preserve any safelist of paths.
Every run must first scan /workspace and report the top 5 space-consuming
first-level directories. This scan is read-only and helps the operator see which
workspaces or caches are driving network SSD cost before cleanup runs.
Use this skill when:
- A RunPod pod's
/workspace/is filling up with old data. - The user explicitly asks to delete old/unused files from a RunPod pod.
- Preparing a pod for a new experiment and want to reclaim space from prior runs.
Warning: This skill is destructive. Deleted files are not recoverable from
RunPod's /workspace/ unless external backups exist. For on-demand runs, first
produce a dry-run preview and require explicit confirmation before deletion. For
daemon mode, deletion is allowed only when the user has already configured that
pod for automated cleanup and the durable log records every deleted path.
Two Modes of Operation
sue-runpod-cleanup supports three modes:
| mode | trigger | where it runs | use case |
|---|---|---|---|
| On-demand (original) | User invokes skill interactively | Mac mini control plane → SSH to pod | One-time cleanup, pre-experiment prep |
| Pod cron (preferred for cost control) | User wants automated RunPod cleanup | RunPod pod cron | Hourly cleanup of /workspace data not accessed for 2 days |
| Daemon (Hermes) | launchd LaunchAgent on Mac mini | Mac mini control plane → SSH to pod(s) every hour | Continuous automated cleanup when pod cron is unavailable |
The pod cron mode is the default cost-saving automation for RunPod network
SSD storage: the cleanup script lives under /workspace/.cleanup_logs/ and
cron runs it hourly. Use the Mac mini daemon mode only when pod-side cron is
unavailable or the operator wants one control-plane service to clean multiple
pods.
Scope: Pod-level (one RunPod pod at a time)
sue-runpod-cleanup operates on the remote execution plane for one RunPod
pod's /workspace/ directory, not on the DeepResearch project/repo as a whole.
| scope | meaning | handled by |
|---|---|---|
| Pod-level (this skill) | Delete stale files under /workspace/ on one RunPod pod | sue-runpod-cleanup |
| Project-level (not this skill) | Local repo checkout, full <SANDBOX>_DEEPRESEARCH_ROOT tree, all workspaces | sue-inode-cleanup for diagnosis; never auto-sweep |
The local machine is the control plane only: resolve SSH routes, launch the
cleanup script in a sandbox tmux session, and poll logs. Do not run deletion
commands against the local laptop deepresearch repo or local workspace/
copies.
In scope (pod-level)
/workspace/on the selected RunPod pod.- Files and directories under
/workspace/whose access time is 3+ days old. - Empty directories left behind after file deletion.
Out of scope (never treat as default cleanup targets)
- Files outside
/workspace/on the pod (system paths,/tmp,/home, etc.). - Local control-plane paths: laptop repo root, local
workspace/<workspace>/. - Other RunPod pods'
/workspace/directories. - The DeepResearch repo checkout on the pod (if it lives under
/workspace/, it IS targeted — this skill has no safelist).
Default: one pod + /workspace/ + 2-day threshold. Ask before expanding scope.
Mandatory First Step: Top-5 Space Scan
Before cleanup or cron installation, scan /workspace on the selected pod and
report the top 5 first-level directories by size:
bash .codex/skills/sue-runpod-cleanup/runpod_cleanup_daemon.sh --top5
The remote log is /workspace/.cleanup_logs/top5_space.log. For large network
SSD trees, run the scan in a detached pod-side tmux session and poll the log;
do not keep a foreground SSH session open as the supervisor.
Mode 1: On-Demand Cleanup (Interactive)
Up-Front Interview
Read already-resolved values from the workspace's remote
scale_up_outputs/<exp_dir>/config/runtime.yaml on the selected sandbox first;
ask only what is still missing or stale. Do not re-ask answered questions unless
the user explicitly revises or resets them.
- RunPod pod — which pod ID or pod name to target. Default from the active
workspace's
runtime.yamlif a RunPod pod is recorded there. - Workspace — which workspace's output tree (for logging and context, not for scoping the deletion). Default from the active workspace name.
- Target directory — confirm
/workspace(default, non-negotiable). This skill only cleans/workspace; cleaning other paths requires a different workflow. - Age threshold — days since last access (default 2). Confirm before proceeding. Acceptable range: 1–30 days.
Resolve DeepResearch repo context (control plane only)
Invoke sue-context to discover the deepresearch repo root and load
memory/project.md, AGENTS.md, memory/sue/SCALE_UP.md, config/codex_sync.json,
config/sue-templates/runtime.yaml, and .codex/skills/AGENTS.md. Use this only to
resolve SSH routes, pod IDs, and backend env — not as the cleanup target. Do
not proceed until the repo root and selected sandbox config are resolved.
Position in the Workflow
sue-config → sue-scripts-writing → sue-dryrun → sue-fullrun
→ sue-cleanup # post-fullrun inode conservation on shared filesystems
→ sue-fullrun-summarize-result
Orthogonal services:
sue-runpod-cleanup # THIS SKILL: pod-level /workspace/ stale-file deletion
sue-inode-cleanup # whole-filesystem inode diagnosis
sue-reset # close an experiment epoch
sue-runpod-cleanup is an orthogonal, on-demand service — it does not
replace sue-cleanup (which archives bulk experiment outputs on shared
filesystems like LUMI Lustre or Snellius GPFS). RunPod's /workspace/ is
pod-local persistent storage, not a shared parallel filesystem, so the
inode-conservation tar+remove pattern of sue-cleanup does not apply here.
Instead, this skill performs direct deletion of stale files to reclaim disk space.
Handoffs
| direction | skill | edge |
|---|---|---|
| ← in | sue-sandbox-admin / sue-sandbox-setup | pod registration and SSH route resolution |
| ← in | sue-monitor | monitor detects disk-full or inode pressure on a RunPod pod |
| → out | sue-update-lesson | epilogue — append a durable SCALE_UP.md lesson if this run revealed a new RunPod cleanup pattern |
On-Demand Procedure
-
Verify precondition. Confirm the target RunPod pod is reachable via SSH. Resolve
RUNPOD_SSHfromdeepresearch-sandbox/config_runpod.txtor the workspaceruntime.yaml. If the pod is not reachable, stop and report the exact SSH error. -
Resolve target path. Default to `/wor
Content truncated.