agilab-ui-robot-validation
Validate AGILAB Streamlit UI changes with the repo's browser and widget robots. Use when touching ABOUT, PROJECT, ORCHESTRATE, ANALYSIS, SETTINGS, sidebar flows, first-proof wizard links, notebook import, screenshots, or public demo UI evidence.
Install
mkdir -p .claude/skills/agilab-ui-robot-validation && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/16556" && unzip -o skill.zip -d .claude/skills/agilab-ui-robot-validation && rm skill.zipInstalls to .claude/skills/agilab-ui-robot-validation
Activation
This is the description your AI agent reads to decide when to run this skill — the better it matches your request, the more reliably it fires.
Validate AGILAB Streamlit UI changes with the repo's browser and widget robots. Use when touching ABOUT, PROJECT, ORCHESTRATE, ANALYSIS, SETTINGS, sidebar flows, first-proof wizard links, notebook import, screenshots, or public demo UI evidence.About this skill
AGILAB UI Robot Validation
Use this skill when a change affects user-visible Streamlit behavior, page navigation, sidebar actions, wizard links, notebook import/upload flows, UI screenshots, or public demo evidence.
The goal is to catch real browser/session-state failures that helper tests and
static AppTest checks can miss: broken st.switch_page paths, recursive
deep-links, hidden upload controls, stale sidebar state, and Streamlit
exceptions that only appear after clicking through the UI.
Tool Choice
- Use focused unit/helper tests first when the bug is pure Python state logic.
- Use Streamlit
AppTestwhen the failure is widget wiring, page hydration, or session-state initialization and no real browser behavior is needed. - Use
tools/agilab_web_robot.pyfor browser-level entrypoint checks, hosted demo checks, screenshots, and notebook upload handoff behavior. - Use
tools/agilab_widget_robot.pyfor page-by-page Streamlit widget flows, selected action buttons, artifact assertions, and stateful project journeys. - Use
tools/agilab_widget_robot_matrix.pywhen the change touches navigation, sidebar project actions, notebook import, settings, first-launch, or broad UI behavior that must stay consistent across pages. - For Streamlit or React/
agi-webpage validation, inspect browser dev-log evidence too: console errors/warnings,pageerror, failed requests, and HTTP 4xx/5xx responses. A page is not validated just because the visible DOM rendered when the browser log shows a relevant runtime or asset failure.
Do not replace a deterministic helper regression with a slow robot. Robots are for user journeys, browser-only behavior, and release/public-demo evidence.
Preflight
- Confirm the repo is the source checkout you intend to test.
git status --short --branch --untracked-files=no
- Check the exact local workflow profile before inventing commands.
uv --preview-features extra-build-dependencies run python tools/workflow_parity.py --profile ui-robot-matrix --print-only
- If a change affects release evidence, also inspect the release shortcut.
./dev --print-only release
Fast Local Commands
Use this for a first-launch smoke when the entry shell, ABOUT page, or default navigation changed:
UV_PYTHON=3.13 uv --preview-features extra-build-dependencies run python tools/first_launch_robot.py --json --output /tmp/agilab-first-launch-robot.json
Use this for Streamlit dependency, run-configuration, theme, or blank-page frontend issues. It launches the dev app, checks JS/CSS MIME types, then verifies the first page hydrates in Chromium:
UV_PYTHON=3.13 uv --preview-features extra-build-dependencies run --extra ui --with playwright python tools/agilab_web_robot.py \
--frontend-smoke-only \
--timeout 45 \
--target-seconds 45 \
--json \
--screenshot-dir /tmp/agilab-frontend-smoke-screenshots \
> /tmp/agilab-frontend-smoke.json
Use this for browser-level ABOUT and notebook handoff issues:
UV_PYTHON=3.13 uv --preview-features extra-build-dependencies run --extra ui --with playwright python tools/agilab_web_robot.py \
--json \
--screenshot-dir /tmp/agilab-web-robot-screenshots \
> /tmp/agilab-web-robot.json
Use this for a selected page/action journey. Keep labels exact and fail if the requested action is missing:
AGILAB_WIDGET_ROBOT_RUNTIME_ISOLATION=current-home \
UV_PYTHON=3.13 uv --preview-features extra-build-dependencies run --with playwright python tools/agilab_widget_robot.py \
--apps flight_project \
--pages ORCHESTRATE \
--apps-pages none \
--json \
--json-output /tmp/agilab-widget-robot.json \
--progress-log /tmp/agilab-widget-robot.ndjson \
--interaction-mode full \
--action-button-policy click-selected \
--click-action-labels "CHECK distribute" \
--preselect-labels "Run now" \
--missing-selected-action-policy fail \
--runtime-isolation current-home
Use this when an embedded ANALYSIS app surface must expose app-owned controls
without firing callbacks. The text and button probes inspect the top page and
child iframes, and --required-action-labels only trial-clicks buttons:
UV_PYTHON=3.13 uv --preview-features extra-build-dependencies run --with playwright python tools/agilab_widget_robot_matrix.py \
--scenario isolated-pytorch-playground-analysis \
--json \
--quiet-progress \
--no-result-cache \
--output-dir /tmp/agilab-pytorch-analysis-robot \
--screenshot-dir /tmp/agilab-pytorch-analysis-robot-screenshots
Use explicit browser-error evidence when React/agi-web, custom components,
iframes, or Streamlit frontend assets are part of the change. The widget robot
captures Chromium console warnings/errors, pageerror, failed requests, and
HTTP error responses into its JSON/progress evidence and failure bundle:
UV_PYTHON=3.13 uv --preview-features extra-build-dependencies run --with playwright python tools/agilab_widget_robot_matrix.py \
--scenario isolated-browser-error-core-pages \
--json \
--quiet-progress \
--no-result-cache \
--output-dir /tmp/agilab-browser-error-robot \
--screenshot-dir /tmp/agilab-browser-error-robot-screenshots
Use this before release or after broad navigation/sidebar work:
UV_PYTHON=3.13 uv --preview-features extra-build-dependencies run python tools/workflow_parity.py --profile ui-robot-matrix
Use this to inspect the exact sharded matrix commands without launching the robots:
uv --preview-features extra-build-dependencies run python tools/workflow_parity.py --profile ui-robot-matrix --print-only
Choosing Scenarios
- ABOUT / first-proof wizard:
run
first_launch_robot.py, the focused ABOUT tests, and at least the matrix scenario that covers entry and app pages. When the first visible copy or product journey wording changes, updatetools/first_launch_robot.pyexpectations in the same change so CI validates the current pitch instead of stale labels. - Streamlit dependency,
pyproject.toml, run config, theme, or launch wrapper: runtools/agilab_web_robot.py --frontend-smoke-onlyfirst. This is the fastest real-browser guard for blank pages caused by static frontend assets being served with the wrong MIME type. - React/
agi-web, custom components, canvas/WebGL, or embedded iframe changes: run a Chromium/Chrome browser robot and inspect the captured browser issues even when the page looks correct. Treat relevant console errors, page errors, failed asset/API requests, and HTTP 4xx/5xx responses as validation failures unless there is an explicit ignore rule. - PROJECT sidebar, create/import/rename/delete: run focused PROJECT tests plus matrix scenarios for project page, project-import-sidebar, project-rename-sidebar, and notebook import.
- Notebook import/upload:
run notebook-import helper tests plus
agilab_web_robot.pywhen the file chooser, upload handoff, or built-in notebook route changed. - ORCHESTRATE action buttons:
use
agilab_widget_robot.py --action-button-policy click-selectedwith the exact visible button labels the end user is expected to press. If the button is intentionally not clicked by generic robots because it writes local state, launches external work, or is advisory-only, add or update its disposition intools/ui_robot_action_contract.pyand cover the behavior with focused helper/AppTest regressions. Examples include LAN discovery/cache controls and advisory planning actions such asBuild cluster plan. - When a UI action keeps the same visible button label but changes semantics
behind a selector or multiselect, update the robot action disposition for the
exact visible label and add focused regressions for the selector state. Do not
rename robot dispositions to internal semantics such as
Update selectedunless that is the actual button text users see. - SETTINGS or Streamlit system-menu changes: run settings page tests plus the settings matrix scenario.
- Public demo or HF Space UI:
run
tools/hf_space_smoke.py --jsonfirst, then run the web robot against the hosted URL if the claim is about browser-visible behavior.
Evidence Rules
- Save JSON summaries and progress logs under
/tmpfor local debugging, or undertest-results/only when the artifact is intentionally part of CI or release evidence. - For any browser validation, inspect the dev-log evidence before declaring the
page valid. In widget/matrix runs this means checking the JSON/progress output
and, on failure, the
browser-issues.jsonfile in the failure bundle. For manual Chrome validation, open DevTools Console and Network and report whether relevant console errors/warnings,pageerrorequivalents, failed requests, or HTTP 4xx/5xx responses were present. - Use
--screenshot-dirfor browser/UI failures. Screenshots should include the manifest generated by the robot so evidence can be traced back to the command. - For full matrix runs, prefer the sharded
ui-robot-matrixprofile. CI keeps successful scenarios lightweight and reruns only failed scenarios with--retry-failed-with-artifacts, producing trace, HAR, and video evidence under each shard'sfailure-artifacts/directory. - When diagnosing a matrix failure, inspect the aggregate artifact first. The
ui-robot-matrix-aggregate-*report links the shard, failure bundle, replay command, artifact-retry status, and any trace/HAR/video directories. - Use
tools/ui_robot_failure_replay.py <bundle>to print the exact command recorded in a failure bundle, and add--executeonly when you intentionally want to rerun that recorded command. - In final notes, report the scenario name, command class, JSON output path, and whether screenshots were generated. Do not claim a full UI sweep when only one page or button was tested.
- If a robot fails from environment setup, missing Playwright, port c
Content truncated.