agentskills.codes
SU

sue-runpod-cleanup

sue-runpod-cleanup: Clean stale files under /workspace/ on a RunPod

Install

mkdir -p .claude/skills/sue-runpod-cleanup && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/13497" && unzip -o skill.zip -d .claude/skills/sue-runpod-cleanup && rm skill.zip

Installs to .claude/skills/sue-runpod-cleanup

Activation

This is the description your AI agent reads to decide when to run this skill — the better it matches your request, the more reliably it fires.

sue-runpod-cleanup: Clean stale files under /workspace/ on a RunPod
67 charsno explicit “when” trigger

About this skill

Sue RunPod Cleanup

Workflow

Resolve the RunPod target, verify /workspace, scan the top 5 space-consuming directories, then install or run automated cleanup for files not accessed in 2 days. Use pod-side tmux for long scans and cleanup commands, and report exact log paths.

DeepResearch Root Contract (HIGHEST PRIORITY)

Before any SUE action in this skill, verify the workspace output-root contract. If any check fails, STOP and abort; do not proceed with discovery, planning, execution, monitoring, or lessons.

  1. The workspace path must be $<SANDBOX>_DEEPRESEARCH_ROOT/workspace/<workspace>/.
  2. All scale-up outputs must be written to $<SANDBOX>_DEEPRESEARCH_ROOT/workspace/<workspace>/scale_up_outputs/ or a subpath declared in runtime.yaml under that root.
  3. $<SANDBOX>_DEEPRESEARCH_ROOT must be resolvable via sue-context (or the selected sandbox's private config for remote backends). Do not fall back to $HOME, arbitrary scratch, or the source tree.

This contract takes precedence over sandbox selection, GPU accounting, manifest validation, and all downstream steps.

Hard Sandbox Repo Root Gate

Before any remote query, sync, cleanup, dryrun/fullrun submission, monitor, quota check, or output-path mutation, verify the selected backend exports a non-empty <SANDBOX>_DEEPRESEARCH_ROOT — one of LUMI_DEEPRESEARCH_ROOT, NM5_DEEPRESEARCH_ROOT, SNELLIUS_DEEPRESEARCH_ROOT, BREV_DEEPRESEARCH_ROOT, AUTODL_DEEPRESEARCH_ROOT, or RUNPOD_DEEPRESEARCH_ROOT — and that the directory exists. If the root is unset, empty, missing, or only inferable from cwd, scratch/project roots, PROJECT_ROOT, NM5_PROJECT_ROOT, workspace name, or marker walking, terminate immediately with a blocker naming the missing key. Do not derive a replacement root, do not continue with a guessed checkout, and do not run destructive or quota-consuming commands.

Sandbox Communication

When this workflow communicates with a remote sandbox, keep SSH/API calls as control-plane actions: launch, inspect, fetch logs, or stop work. If a remote command is likely to run long, stream substantial output, or require repeated polling, start it inside a detached tmux session on the sandbox and return after verifying the session name and durable log path. Do not keep a local SSH connection open as the job supervisor.

Use a stable session name and a log under the workspace's configured logs_root or scale_up_outputs/logs/. Prefer tmux new-session -d -s <name> 'bash <script> 2>&1 | tee -a <log>'; avoid tmux send-keys. On direct-run sandboxes such as Brev, AutoDL, or RunPod, use tmux for long-running remote commands unless the platform provides an equivalent detached process supervisor. If tmux is unavailable on the sandbox, stop and report the exact missing tool instead of silently keeping the connection open.

Purpose

Keep RunPod network SSD use small and inexpensive by continuously cleaning stale data under /workspace on a RunPod pod. "Stale" means not accessed (read) in 2 or more days. This skill deletes everything under /workspace that meets the age threshold — including experiment outputs, checkpoints, logs, and cached data. It does not preserve any safelist of paths.

Every run must first scan /workspace and report the top 5 space-consuming first-level directories. This scan is read-only and helps the operator see which workspaces or caches are driving network SSD cost before cleanup runs.

Use this skill when:

  • A RunPod pod's /workspace/ is filling up with old data.
  • The user explicitly asks to delete old/unused files from a RunPod pod.
  • Preparing a pod for a new experiment and want to reclaim space from prior runs.

Warning: This skill is destructive. Deleted files are not recoverable from RunPod's /workspace/ unless external backups exist. For on-demand runs, first produce a dry-run preview and require explicit confirmation before deletion. For daemon mode, deletion is allowed only when the user has already configured that pod for automated cleanup and the durable log records every deleted path.

Two Modes of Operation

sue-runpod-cleanup supports three modes:

modetriggerwhere it runsuse case
On-demand (original)User invokes skill interactivelyMac mini control plane → SSH to podOne-time cleanup, pre-experiment prep
Pod cron (preferred for cost control)User wants automated RunPod cleanupRunPod pod cronHourly cleanup of /workspace data not accessed for 2 days
Daemon (Hermes)launchd LaunchAgent on Mac miniMac mini control plane → SSH to pod(s) every hourContinuous automated cleanup when pod cron is unavailable

The pod cron mode is the default cost-saving automation for RunPod network SSD storage: the cleanup script lives under /workspace/.cleanup_logs/ and cron runs it hourly. Use the Mac mini daemon mode only when pod-side cron is unavailable or the operator wants one control-plane service to clean multiple pods.

Scope: Pod-level (one RunPod pod at a time)

sue-runpod-cleanup operates on the remote execution plane for one RunPod pod's /workspace/ directory, not on the DeepResearch project/repo as a whole.

scopemeaninghandled by
Pod-level (this skill)Delete stale files under /workspace/ on one RunPod podsue-runpod-cleanup
Project-level (not this skill)Local repo checkout, full <SANDBOX>_DEEPRESEARCH_ROOT tree, all workspacessue-inode-cleanup for diagnosis; never auto-sweep

The local machine is the control plane only: resolve SSH routes, launch the cleanup script in a sandbox tmux session, and poll logs. Do not run deletion commands against the local laptop deepresearch repo or local workspace/ copies.

In scope (pod-level)

  • /workspace/ on the selected RunPod pod.
  • Files and directories under /workspace/ whose access time is 3+ days old.
  • Empty directories left behind after file deletion.

Out of scope (never treat as default cleanup targets)

  • Files outside /workspace/ on the pod (system paths, /tmp, /home, etc.).
  • Local control-plane paths: laptop repo root, local workspace/<workspace>/.
  • Other RunPod pods' /workspace/ directories.
  • The DeepResearch repo checkout on the pod (if it lives under /workspace/, it IS targeted — this skill has no safelist).

Default: one pod + /workspace/ + 2-day threshold. Ask before expanding scope.

Mandatory First Step: Top-5 Space Scan

Before cleanup or cron installation, scan /workspace on the selected pod and report the top 5 first-level directories by size:

bash .codex/skills/sue-runpod-cleanup/runpod_cleanup_daemon.sh --top5

The remote log is /workspace/.cleanup_logs/top5_space.log. For large network SSD trees, run the scan in a detached pod-side tmux session and poll the log; do not keep a foreground SSH session open as the supervisor.

Mode 1: On-Demand Cleanup (Interactive)

Up-Front Interview

Read already-resolved values from the workspace's remote scale_up_outputs/<exp_dir>/config/runtime.yaml on the selected sandbox first; ask only what is still missing or stale. Do not re-ask answered questions unless the user explicitly revises or resets them.

  1. RunPod pod — which pod ID or pod name to target. Default from the active workspace's runtime.yaml if a RunPod pod is recorded there.
  2. Workspace — which workspace's output tree (for logging and context, not for scoping the deletion). Default from the active workspace name.
  3. Target directory — confirm /workspace (default, non-negotiable). This skill only cleans /workspace; cleaning other paths requires a different workflow.
  4. Age threshold — days since last access (default 2). Confirm before proceeding. Acceptable range: 1–30 days.

Resolve DeepResearch repo context (control plane only)

Invoke sue-context to discover the deepresearch repo root and load memory/project.md, AGENTS.md, memory/sue/SCALE_UP.md, config/codex_sync.json, config/sue-templates/runtime.yaml, and .codex/skills/AGENTS.md. Use this only to resolve SSH routes, pod IDs, and backend env — not as the cleanup target. Do not proceed until the repo root and selected sandbox config are resolved.

Position in the Workflow

sue-config → sue-scripts-writing → sue-dryrun → sue-fullrun
  → sue-cleanup                    # post-fullrun inode conservation on shared filesystems
  → sue-fullrun-summarize-result

Orthogonal services:
  sue-runpod-cleanup               # THIS SKILL: pod-level /workspace/ stale-file deletion
  sue-inode-cleanup                # whole-filesystem inode diagnosis
  sue-reset                        # close an experiment epoch

sue-runpod-cleanup is an orthogonal, on-demand service — it does not replace sue-cleanup (which archives bulk experiment outputs on shared filesystems like LUMI Lustre or Snellius GPFS). RunPod's /workspace/ is pod-local persistent storage, not a shared parallel filesystem, so the inode-conservation tar+remove pattern of sue-cleanup does not apply here. Instead, this skill performs direct deletion of stale files to reclaim disk space.

Handoffs

directionskilledge
← insue-sandbox-admin / sue-sandbox-setuppod registration and SSH route resolution
← insue-monitormonitor detects disk-full or inode pressure on a RunPod pod
→ outsue-update-lessonepilogue — append a durable SCALE_UP.md lesson if this run revealed a new RunPod cleanup pattern

On-Demand Procedure

  1. Verify precondition. Confirm the target RunPod pod is reachable via SSH. Resolve RUNPOD_SSH from deepresearch-sandbox/config_runpod.txt or the workspace runtime.yaml. If the pod is not reachable, stop and report the exact SSH error.

  2. Resolve target path. Default to `/wor


Content truncated.

Search skills

Search the agent skills registry