evidence

Name: evidence
Author: juspay

Install

mkdir -p .claude/skills/evidence && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/13194" && unzip -o skill.zip -d .claude/skills/evidence && rm skill.zip

Installs to .claude/skills/evidence

Activation

This is the description your AI agent reads to decide when to run this skill — the better it matches your request, the more reliably it fires.

Produce visual PR evidence — a screenshot or a video — whenever a change has on-screen impact. If exercising the change would make the screen look different (a rendered view, an error/empty/loading/blocked state, a panel, layout, an icon, motion, a live update) a visual artifact is MANDATORY, not optional. A change can be backend by cause and visible by effect: tests are never a substitute for a visual artifact when there is on-screen impact. Capture rides the project's own Cucumber + Playwright e2e harness on an ephemeral pu box (video of a flow, or a still pulled from the clip), or drives a live kolu with the chrome-devtools MCP for a state no scenario reaches. Then transcode (ffmpeg), host on a GitHub release, and post a `## Evidence` comment. Triggers on "post evidence", "screenshot the change", "PR evidence", "record a video of this", "capture the UI", "show it working", "prove it", or finishing any change whose effect is visible on screen.

959 chars✓ has a “when” triggerlonger than Claude Code's old 250-char listing cap (fine on current versions)

About this skill

evidence — PR screenshots & video, recorded via the e2e harness on a pu box

0. The forcing function — run this gate FIRST, before any mechanics

Visual evidence is mandatory whenever a change has on-screen impact. Do not skip to the machinery below until you have answered the gate.

The gate (one behavioral question, no architecture judgment):

If someone exercised what this change affects, would the screen look DIFFERENT — before vs. after? A rendered view, an error / empty / loading / blocked state, a panel, layout, an icon, motion, or a live update?

                  ┌─────────────────────────────────────────────┐
                  │  Would the screen look DIFFERENT, before     │
                  │  vs. after, if someone exercised this?       │
                  └───────────────┬─────────────────────────────┘
                          NO ──────┤────── YES
                                   │
            ┌──────────────────────┘
            │                                         │
   State it, then skip:                     A visual artifact is MANDATORY.
   one line in the PR —                     Now: static or motion?
   "No visual impact:               ┌───────────────┴────────────────┐
    <why>." Silent skip          STATIC                            MOTION
   is NOT allowed.          end-state · single moment ·       transition · multi-step flow ·
                            before↔after comparison ·         live update · latency-to-payoff ·
                            blocked/error/empty state          animation · drag/resize
                                   │                                 │
                              SCREENSHOT (§A)                    VIDEO (§B)

YES → a visual artifact is mandatory. Not "worth proving", not optional. Pick image or video by the rule below and produce it.
NO → state it, then skip. A genuine no-visual change (pure internal refactor, build / CI / tooling, protocol-only change with no rendered surface) is the only legitimate skip — and even then you must write one explicit line in the PR: No visual impact: <why>. Skipping silently is not allowed. Do not over-trigger: a true no-visual change must stay skippable with that one line — the gate asks whether the screen differs, not whether any byte changed. If exercising the change leaves the screen identical, skip it with the one line; don't manufacture a tenuous pixel.

Escape hatches that are NOT valid — these are closed

A real run skipped the artifact with each of these. None of them holds. (The run fixed a backend path guard that followed symlinks, then posted the unit-test suite as the ## Evidence comment. The user rejected it: "unit tests prove the logic, but you want to SEE it in the actual Code browser.")

"It's a backend change / there's no UI surface." A change can be backend by cause and visible by effect. The symlink path guard is pure backend logic — yet when the fixed guard rejects a symlinked file the Code tab shows an error / blocked state instead of file contents. That on-screen difference is exactly what the gate asks about. Trace the effect to the screen, not the cause to a layer.
"No existing scenario exercises it." That decides how you capture (drive it live — §A2's chrome-devtools path), never whether you capture. Absence of a scenario is not a skip.
"It renders only in a terminal / it's a CLI / TUI / daemon, so a pasted text transcript is the evidence." A terminal, CLI, or TUI render is an on-screen surface — the gate answers YES, not NO. "No kolu (browser) surface" is not "no on-screen surface": the screen the user means is whatever the change makes someone look at, terminal included. And a real text transcript — a pasted code-block of the binary's actual output — is not a visual artifact even though it came from really running the binary: the user wants to see it render. Capture an actual vhs/asciinema recording of the real binary against live data (a GIF/mp4, or a still pulled from it — see the vhs section), never a pasted transcript. A real run posted exactly this — "CLI/daemon change with no on-screen surface, so the faithful artifact is a terminal transcript" — and the user rejected it: "I want actual terminal recording for evidence."
"The test suite / the unit tests are the honest proof." Tests prove the logic; they are never a substitute for a visual artifact when there is on-screen impact. They may accompany the artifact, never replace it. The user wants to SEE it in the actual app.
"I rebuilt the artifact from the code's own output / a formatter / a mock — close enough." A synthesized artifact — formatter output pasted as if it were a recording, a hand-assembled table, a screenshot of mocked data, a clip of a stub — is a fabrication: the gate defeated while pretending to pass it, not evidence. The artifact must come from really executing the change end-to-end against real data, and before you post it you must read the real output and confirm the feature actually works — running it for real is also the cheapest place to catch "it doesn't work," because a fabricated artifact looks correct by construction and hides the very bug evidence exists to expose. A real run did exactly this: it posted formatter output as the ## Evidence for a CLI status command without ever invoking the binary; when the user ran it the table was empty (a real resolution-race bug the fake artifact had concealed) — "It doesn't work LOL." Author's own verdict afterward: "it was formatter output, not a real recording. That was wrong of me." For a CLI/TUI this means an actual vhs/asciinema capture of the real binary against live data (see the vhs section), never a reconstruction of what its output would look like.

Image vs. video — the decision rule

SCREENSHOT (image) — a static end-state, a single moment, or a before↔after comparison: a rendered view, an error / empty / loading / blocked state, a panel that now appears, a layout or icon change. One frame tells the whole story. When there's no motion, a screenshot is enough.
VIDEO — motion is the point: a transition, a multi-step flow, a live update, latency-to- payoff, an animation, a drag/resize. You need to watch it happen.

A value-over-time display is MOTION, never a single moment. If the element's job is to change on its own — a running timer, "Running for" / relative-time / "last seen" string, countdown, live counter, progress that advances, a tick at some cadence — classify it VIDEO and watch it tick. A still of Running for 0s is structurally blind to the actual defect: a frozen clock and a live one are pixel-identical in one frame, so a screenshot can never prove the tick fires at the cadence the displayed precision implies (seconds shown ⇒ must update every second). Classify by what the element does over time, not by what the panel looks like at one instant.

Image and video are co-equal first-class outputs. A screenshot is a complete deliverable, not a runner-up to video. (A before↔after of a static change is two stills; only reach for two clips when the difference is itself motion — e.g. an animation-timing change.)

Capture runs on an ephemeral pu box (see the pu skill), not locally — so evidence reflects a clean, CI-like build of the PR's own commit and nothing touches the user's machine. The box has its own loopback, so the harness binds plain ports there with zero risk to anything the user is running.

Prefer the project's existing Cucumber + Playwright e2e harness — record an existing scenario by name. It drives every UI surface through a maintained step library, so the clip is produced by the same code CI exercises and the .feature files stay pristine. That's why it's the default.

But the artifact is what matters — never skip for lack of a canned path. When the harness can't reach the state (and no scenario is worth authoring from existing steps), be inventive: drive a live kolu with the chrome-devtools MCP (§A2), pull a still from a recorded clip (§A1), use vhs for a TUI, or hand-roll a one-off driver as a last resort. Be inventive when the harness genuinely doesn't fit — not as a shortcut past a quick scenario that would do.

What "prefer the harness" does and doesn't mean. The one thing worth avoiding is a parallel video-capture harness that duplicates the step library and drifts from CI — so for video of a flow, reuse the maintained steps (author a quick scenario rather than scripting a flow from scratch). Everything else is fair game: a screenshot of a static state, the ordinary setup a screenshot needs (starting the server, staging an on-disk precondition such as planting a symlink), pulling a still from a recorded clip, or driving a state live with the chrome-devtools MCP to take_screenshot. None of these are off-limits.

Delegate to a subagent (Agent(subagent_type="general-purpose", model="sonnet")) so the main context stays clear of capture noise. Brief it with the box name, the branch, whether the deliverable is an image or a video, the scenario to record (feature file + exact scenario name) or the live state to drive, a <slug>, and the PR number; have it return only the markdown body it posted.

How the harness records (wired in `packages/tests/support/hooks.ts`, gated on `KOLU_EVIDENCE`)

Off by default — normal runs pay nothing. With KOLU_EVIDENCE=1 the e2e harness:

sets Playwright recordVideo on the browser context (size = the evidence viewport, so the capture is 1:1 with no downscaling);
records at a denser 1280×720 viewport (the normal 1920×1080 desktop floats the UI small in empty canvas — see Legibility);
adds slowMo so the lead-up is watchable;
**

Content truncated.

More by juspay

View all by juspay →

do

juspay

Do a task end-to-end — implement, PR, CI loop, ship. ONLY invoke when the user explicitly types `/do` or `$do`; never auto-select from a natural-language request, even one that sounds like an end-to-end task.

Install

mkdir -p .claude/skills/evidence && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/13194" && unzip -o skill.zip -d .claude/skills/evidence && rm skill.zip

Installs to .claude/skills/evidence

Safety

Review before install

Runs shell / code

Automated static scan of the SKILL.md and repo. A flag describes what the skill can do — not a verdict. Always review code before installing.

Source & maintenance

Updated

1d ago

License

AGPL-3.0

Repo stars

Loads

~4,994 tokens

Stars are for the whole repository, not this skill alone.

Stats

Views

Installs

Author

juspay

2 skills published

Links

Source code

evidence

Install

Activation

About this skill

evidence — PR screenshots & video, recorded via the e2e harness on a pu box

0. The forcing function — run this gate FIRST, before any mechanics

Escape hatches that are NOT valid — these are closed

Image vs. video — the decision rule

How the harness records (wired in packages/tests/support/hooks.ts, gated on KOLU_EVIDENCE)

More by juspay

do

Search skills

How the harness records (wired in `packages/tests/support/hooks.ts`, gated on `KOLU_EVIDENCE`)