gtkb-benchmarks
Run GT-KB read-only measurement benchmarks. Outputs JSON plus markdown summary.
Install
mkdir -p .claude/skills/gtkb-benchmarks && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/13810" && unzip -o skill.zip -d .claude/skills/gtkb-benchmarks && rm skill.zipInstalls to .claude/skills/gtkb-benchmarks
Activation
This is the description your AI agent reads to decide when to run this skill — the better it matches your request, the more reliably it fires.
Run GT-KB read-only measurement benchmarks. Outputs JSON plus markdown summary.About this skill
GT-KB Benchmark Suite
Read-only measurement benchmarks for the GT-KB platform. Each benchmark computes a structured observation with a headline scalar plus per-dimension breakdown, and emits both JSON and markdown summary artifacts.
Operationalizes SPEC-1662 (GOV-18 Assertion Quality Standard) and GOV-ARTIFACT-ORIENTED-GOVERNANCE-001 per Self-Diagnostic Leak Closure Slice 2.
Benchmarks
| ID | Question Answered |
|---|---|
| linkage_heatmap | What fraction of cross-artifact references survive across SPEC, WI, ADR or DCL or GOV, DELIB, BRIDGE pairs? |
| recall_coverage | What fraction of recent mutations cite prior-state evidence in change_reason? |
| tool_identification | What fraction of recent insertions carry a structured attribution marker? |
| deliberation_recall | What is the recall at 3 of the semantic index over recent owner-decision deliberations? |
| advisory_latency | What is the median wall-clock latency from advisory filing to first Prime acknowledgement? |
| assertion_signal_noise | What fraction of categorized assertions land outside chronic_noise (signal-bearing)? |
Subcommands
run
Execute one or all benchmarks. Defaults to a one-year window ending now.
python -m scripts.benchmarks.cli run --all
python -m scripts.benchmarks.cli run --benchmark assertion_signal_noise
report
Print a previously emitted run summary.
python -m scripts.benchmarks.cli report --run-id 20260514-040000
compare
Diff two runs by idempotency_key and benchmark value.
python -m scripts.benchmarks.cli compare --baseline RUN_A --candidate RUN_B
Output Contract
Each run writes two files under the runs directory:
- run.json -- full structured payload (run_id, idempotency_key, results).
- summary.md -- human-readable markdown summary table.
The idempotency_key is a SHA-256 hash of the window bounds, benchmark IDs, and source commit. Identical inputs over identical commits produce identical keys.
Governing Artifacts
- SPEC-1662 (GOV-18 Assertion Quality Standard)
- GOV-ARTIFACT-ORIENTED-GOVERNANCE-001
- GOV-STANDING-BACKLOG-001
- ADR-DA-READ-SURFACE-PLACEMENT-001
- DELIB-S312-DETERMINISTIC-SERVICES-PRINCIPLE