Install
mkdir -p .claude/skills/evidence && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/13194" && unzip -o skill.zip -d .claude/skills/evidence && rm skill.zipInstalls to .claude/skills/evidence
Activation
This is the description your AI agent reads to decide when to run this skill — the better it matches your request, the more reliably it fires.
Produce visual PR evidence — a screenshot or a video — whenever a change has on-screen impact. If exercising the change would make the screen look different (a rendered view, an error/empty/loading/blocked state, a panel, layout, an icon, motion, a live update) a visual artifact is MANDATORY, not optional. A change can be backend by cause and visible by effect: tests are never a substitute for a visual artifact when there is on-screen impact. Capture rides the project's own Cucumber + Playwright e2e harness on an ephemeral pu box (video of a flow, or a still pulled from the clip), or drives a live kolu with the chrome-devtools MCP for a state no scenario reaches. Then transcode (ffmpeg), host on a GitHub release, and post a `## Evidence` comment. Triggers on "post evidence", "screenshot the change", "PR evidence", "record a video of this", "capture the UI", "show it working", "prove it", or finishing any change whose effect is visible on screen.About this skill
evidence — PR screenshots & video, recorded via the e2e harness on a pu box
0. The forcing function — run this gate FIRST, before any mechanics
Visual evidence is mandatory whenever a change has on-screen impact. Do not skip to the machinery below until you have answered the gate.
The gate (one behavioral question, no architecture judgment):
If someone exercised what this change affects, would the screen look DIFFERENT — before vs. after? A rendered view, an error / empty / loading / blocked state, a panel, layout, an icon, motion, or a live update?
┌─────────────────────────────────────────────┐
│ Would the screen look DIFFERENT, before │
│ vs. after, if someone exercised this? │
└───────────────┬─────────────────────────────┘
NO ──────┤────── YES
│
┌──────────────────────┘
│ │
State it, then skip: A visual artifact is MANDATORY.
one line in the PR — Now: static or motion?
"No visual impact: ┌───────────────┴────────────────┐
<why>." Silent skip STATIC MOTION
is NOT allowed. end-state · single moment · transition · multi-step flow ·
before↔after comparison · live update · latency-to-payoff ·
blocked/error/empty state animation · drag/resize
│ │
SCREENSHOT (§A) VIDEO (§B)
- YES → a visual artifact is mandatory. Not "worth proving", not optional. Pick image or video by the rule below and produce it.
- NO → state it, then skip. A genuine no-visual change (pure internal refactor, build / CI /
tooling, protocol-only change with no rendered surface) is the only legitimate skip — and even
then you must write one explicit line in the PR:
No visual impact: <why>.Skipping silently is not allowed. Do not over-trigger: a true no-visual change must stay skippable with that one line — the gate asks whether the screen differs, not whether any byte changed. If exercising the change leaves the screen identical, skip it with the one line; don't manufacture a tenuous pixel.
Escape hatches that are NOT valid — these are closed
A real run skipped the artifact with each of these. None of them holds. (The run fixed a backend
path guard that followed symlinks, then posted the unit-test suite as the ## Evidence comment.
The user rejected it: "unit tests prove the logic, but you want to SEE it in the actual Code
browser.")
- "It's a backend change / there's no UI surface." A change can be backend by cause and visible by effect. The symlink path guard is pure backend logic — yet when the fixed guard rejects a symlinked file the Code tab shows an error / blocked state instead of file contents. That on-screen difference is exactly what the gate asks about. Trace the effect to the screen, not the cause to a layer.
- "No existing scenario exercises it." That decides how you capture (drive it live — §A2's chrome-devtools path), never whether you capture. Absence of a scenario is not a skip.
- "It renders only in a terminal / it's a CLI / TUI / daemon, so a pasted text transcript is the
evidence." A terminal, CLI, or TUI render is an on-screen surface — the gate answers YES, not
NO. "No kolu (browser) surface" is not "no on-screen surface": the screen the user means is
whatever the change makes someone look at, terminal included. And a real text transcript — a
pasted code-block of the binary's actual output — is not a visual artifact even though it came
from really running the binary: the user wants to see it render. Capture an actual
vhs/asciinemarecording of the real binary against live data (a GIF/mp4, or a still pulled from it — see the vhs section), never a pasted transcript. A real run posted exactly this — "CLI/daemon change with no on-screen surface, so the faithful artifact is a terminal transcript" — and the user rejected it: "I want actual terminal recording for evidence." - "The test suite / the unit tests are the honest proof." Tests prove the logic; they are never a substitute for a visual artifact when there is on-screen impact. They may accompany the artifact, never replace it. The user wants to SEE it in the actual app.
- "I rebuilt the artifact from the code's own output / a formatter / a mock — close enough." A
synthesized artifact — formatter output pasted as if it were a recording, a hand-assembled table,
a screenshot of mocked data, a clip of a stub — is a fabrication: the gate defeated while
pretending to pass it, not evidence. The artifact must come from really executing the change
end-to-end against real data, and before you post it you must read the real output and confirm
the feature actually works — running it for real is also the cheapest place to catch "it doesn't
work," because a fabricated artifact looks correct by construction and hides the very bug evidence
exists to expose. A real run did exactly this: it posted formatter output as the
## Evidencefor a CLIstatuscommand without ever invoking the binary; when the user ran it the table was empty (a real resolution-race bug the fake artifact had concealed) — "It doesn't work LOL." Author's own verdict afterward: "it was formatter output, not a real recording. That was wrong of me." For a CLI/TUI this means an actualvhs/asciinemacapture of the real binary against live data (see the vhs section), never a reconstruction of what its output would look like.
Image vs. video — the decision rule
- SCREENSHOT (image) — a static end-state, a single moment, or a before↔after comparison: a rendered view, an error / empty / loading / blocked state, a panel that now appears, a layout or icon change. One frame tells the whole story. When there's no motion, a screenshot is enough.
- VIDEO — motion is the point: a transition, a multi-step flow, a live update, latency-to- payoff, an animation, a drag/resize. You need to watch it happen.
A value-over-time display is MOTION, never a single moment. If the element's job is to change on its own — a running timer, "Running for" / relative-time / "last seen" string, countdown, live counter, progress that advances, a tick at some cadence — classify it VIDEO and watch it tick. A still of
Running for 0sis structurally blind to the actual defect: a frozen clock and a live one are pixel-identical in one frame, so a screenshot can never prove the tick fires at the cadence the displayed precision implies (seconds shown ⇒ must update every second). Classify by what the element does over time, not by what the panel looks like at one instant.
Image and video are co-equal first-class outputs. A screenshot is a complete deliverable, not a runner-up to video. (A before↔after of a static change is two stills; only reach for two clips when the difference is itself motion — e.g. an animation-timing change.)
Capture runs on an ephemeral pu box (see the pu skill), not locally — so evidence
reflects a clean, CI-like build of the PR's own commit and nothing touches the user's
machine. The box has its own loopback, so the harness binds plain ports there with zero
risk to anything the user is running.
Prefer the project's existing Cucumber + Playwright e2e harness — record an existing scenario by
name. It drives every UI surface through a maintained step library, so the clip is produced by the
same code CI exercises and the .feature files stay pristine. That's why it's the default.
But the artifact is what matters — never skip for lack of a canned path. When the harness can't
reach the state (and no scenario is worth authoring from existing steps), be inventive: drive a live
kolu with the chrome-devtools MCP (§A2), pull a still from a recorded clip (§A1), use vhs for a
TUI, or hand-roll a one-off driver as a last resort. Be inventive when the harness genuinely doesn't
fit — not as a shortcut past a quick scenario that would do.
What "prefer the harness" does and doesn't mean. The one thing worth avoiding is a parallel video-capture harness that duplicates the step library and drifts from CI — so for video of a flow, reuse the maintained steps (author a quick scenario rather than scripting a flow from scratch). Everything else is fair game: a screenshot of a static state, the ordinary setup a screenshot needs (starting the server, staging an on-disk precondition such as planting a symlink), pulling a still from a recorded clip, or driving a state live with the chrome-devtools MCP to
take_screenshot. None of these are off-limits.
Delegate to a subagent (Agent(subagent_type="general-purpose", model="sonnet")) so
the main context stays clear of capture noise. Brief it with the box name, the branch, whether the
deliverable is an image or a video, the scenario to record (feature file + exact scenario name) or
the live state to drive, a <slug>, and the PR number; have it return only the markdown body it
posted.
How the harness records (wired in packages/tests/support/hooks.ts, gated on KOLU_EVIDENCE)
Off by default — normal runs pay nothing. With KOLU_EVIDENCE=1 the e2e harness:
- sets Playwright
recordVideoon the browser context (size= the evidence viewport, so the capture is 1:1 with no downscaling); - records at a denser 1280×720 viewport (the normal 1920×1080 desktop floats the UI small in empty canvas — see Legibility);
- adds
slowMoso the lead-up is watchable; - **
Content truncated.