agentskills.codes
GT

gtkb-benchmarks

Run GT-KB read-only measurement benchmarks. Outputs JSON plus markdown summary.

Install

mkdir -p .claude/skills/gtkb-benchmarks && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/13810" && unzip -o skill.zip -d .claude/skills/gtkb-benchmarks && rm skill.zip

Installs to .claude/skills/gtkb-benchmarks

Activation

This is the description your AI agent reads to decide when to run this skill — the better it matches your request, the more reliably it fires.

Run GT-KB read-only measurement benchmarks. Outputs JSON plus markdown summary.
79 charsno explicit “when” trigger

About this skill

<!-- GTKB-ANTIGRAVITY-SKILL-ADAPTER Generated: true Generated by: scripts/generate_antigravity_skill_adapters.py Canonical source: .claude/skills/gtkb-benchmarks/SKILL.md Canonical source sha256: 95940f9fecb54eee364558ad2c15b6f34395a1e2a0d8d6e760b90e80c62926be Generated at: 2026-06-11T18:34:14Z Do not edit this adapter directly. Edit the canonical source and regenerate. GTKB-ANTIGRAVITY-SKILL-ADAPTER -->

GT-KB Benchmark Suite

Read-only measurement benchmarks for the GT-KB platform. Each benchmark computes a structured observation with a headline scalar plus per-dimension breakdown, and emits both JSON and markdown summary artifacts.

Operationalizes SPEC-1662 (GOV-18 Assertion Quality Standard) and GOV-ARTIFACT-ORIENTED-GOVERNANCE-001 per Self-Diagnostic Leak Closure Slice 2.

Benchmarks

IDQuestion Answered
linkage_heatmapWhat fraction of cross-artifact references survive across SPEC, WI, ADR or DCL or GOV, DELIB, BRIDGE pairs?
recall_coverageWhat fraction of recent mutations cite prior-state evidence in change_reason?
tool_identificationWhat fraction of recent insertions carry a structured attribution marker?
deliberation_recallWhat is the recall at 3 of the semantic index over recent owner-decision deliberations?
advisory_latencyWhat is the median wall-clock latency from advisory filing to first Prime acknowledgement?
assertion_signal_noiseWhat fraction of categorized assertions land outside chronic_noise (signal-bearing)?

Subcommands

run

Execute one or all benchmarks. Defaults to a one-year window ending now.

python -m scripts.benchmarks.cli run --all
python -m scripts.benchmarks.cli run --benchmark assertion_signal_noise

report

Print a previously emitted run summary.

python -m scripts.benchmarks.cli report --run-id 20260514-040000

compare

Diff two runs by idempotency_key and benchmark value.

python -m scripts.benchmarks.cli compare --baseline RUN_A --candidate RUN_B

Output Contract

Each run writes two files under the runs directory:

  • run.json -- full structured payload (run_id, idempotency_key, results).
  • summary.md -- human-readable markdown summary table.

The idempotency_key is a SHA-256 hash of the window bounds, benchmark IDs, and source commit. Identical inputs over identical commits produce identical keys.

Governing Artifacts

  • SPEC-1662 (GOV-18 Assertion Quality Standard)
  • GOV-ARTIFACT-ORIENTED-GOVERNANCE-001
  • GOV-STANDING-BACKLOG-001
  • ADR-DA-READ-SURFACE-PLACEMENT-001
  • DELIB-S312-DETERMINISTIC-SERVICES-PRINCIPLE

Search skills

Search the agent skills registry