agentskills.codes

Canon TDD for humans and AI agents. Use for production code changes by writing tests first, proving RED, implementing minimal GREEN, and refactoring safely. 2026 edition adds TDP, flakiness gate, ralph-loop integration, memory-search in Step 0, PBT as Step 5.5, mutation testing gate in Step 4, and C

Install

mkdir -p .claude/skills/tdd-oimiragieo && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/13322" && unzip -o skill.zip -d .claude/skills/tdd-oimiragieo && rm skill.zip

Installs to .claude/skills/tdd-oimiragieo

Activation

This is the description your AI agent reads to decide when to run this skill — the better it matches your request, the more reliably it fires.

Canon TDD for humans and AI agents. Use for production code changes by writing tests first, proving RED, implementing minimal GREEN, and refactoring safely. 2026 edition adds TDP, flakiness gate, ralph-loop integration, memory-search in Step 0, PBT as Step 5.5, mutation testing gate in Step 4, and CJS LSP warning.
315 chars✓ has a “when” triggerlonger than Claude Code's old 250-char listing cap (fine on current versions)

About this skill

Test-Driven Development (TDD)

Overview

This skill implements Canon TDD with AI-specific guardrails:

  1. Build or update a scenario list.
  2. Execute exactly one scenario as a runnable test.
  3. Prove RED.
  4. Implement minimum change for GREEN.
  5. Optionally refactor.
  6. Repeat until scenario list is empty.

When to Use

Use for:

  • New features
  • Bug fixes
  • Behavior changes
  • Repository-scale patching driven by tests
  • AI-assisted code generation where tests are executable specifications

Ask human approval before bypassing only for:

  • Throwaway prototypes
  • Purely declarative config edits with no execution path
  • One-off migration scripts that will not be maintained

The Iron Law

NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST

If code was written first, discard and restart from RED.

Canon Loop

Step 0: Create/refresh scenario backlog

Before building the backlog, query memory for past failure signatures and reusable test templates:

Skill({ skill: 'memory-search' }); // query: "<feature-name> test failure signatures"

Read .claude/context/memory/learnings.md for recurring anti-patterns relevant to this task.

Then:

  • Keep a short ordered list of test scenarios for this task.
  • Prioritize by design signal and risk, not by implementation convenience.
  • Add discovered scenarios during execution.
  • Reuse templates from memory — do not repeat failure patterns already documented.

Step 1: Pick exactly one scenario and write one runnable test

  • One behavior per cycle.
  • Use clear behavior names.
  • Favor real collaborators; mock only external boundaries.

Step 2: Prove RED

  • Run the narrowest test command.
  • Failure must be due to missing behavior, not syntax or setup errors.
  • Record red evidence (test file and failing assertion message).

Step 3: Implement minimum GREEN patch

  • Implement only what current red test requires.
  • No speculative APIs or unrelated cleanup.
  • Keep patch bounded to current scenario.

Step 4: Prove GREEN

  • Re-run narrow test command.
  • Run impacted suite (or package-level test set).
  • Confirm no regressions.

Flakiness Gate (mandatory for async, hook, or nondeterministic tests):

For tests that involve async I/O, stop hooks, timers, or file system operations, a single pass is insufficient. Require 3 consecutive passes before declaring GREEN:

# Run 3 times — all 3 must pass
node --test tests/hooks/routing-guard.test.cjs && \
node --test tests/hooks/routing-guard.test.cjs && \
node --test tests/hooks/routing-guard.test.cjs

A test that passes once and fails on the second run is RED, not GREEN. Do not advance to Step 5 until 3 consecutive passes are confirmed.

Mutation Testing Gate (security-critical code only):

For security hooks, routing validators, auth logic, and any code path that controls access or trust decisions, run Stryker mutation testing after achieving GREEN to verify that tests genuinely catch faults and are not vacuously passing.

# Run Stryker mutation testing (threshold: 85%)
npx stryker run
# Require mutationScore >= 85 in stryker.config.json

For fast-check-based property tests on security hooks, the fail-closed property is the mutation-equivalent gate:

// fast-check fail-closed property — must hold for any input
fc.assert(
  fc.property(fc.anything(), input => {
    const result = securityHook(input);
    // Hook must NEVER return allow=true for malformed/unexpected input
    expect(result.allow).not.toBe(true);
  })
);

Skip this gate for non-security application code (Step 4 → Step 5 directly).

Step 5: Optional refactor

  • Refactor only with green tests.
  • Re-run the same test set after refactor.

Step 5.5: Property-Based Testing (recommended for utility functions and security hooks)

After refactor (or after Step 4 for security-critical code), consider supplementing example-based tests with property-based tests. PBT achieves 23.1–37.3% pass@1 improvement over example-based TDD alone for LLM code generation (arXiv:2506.18315) by breaking the self-deception cycle.

When to invoke:

  • Utility functions (encode/decode, parsers, serializers, calculators)
  • Security hooks (input validators, sanitizers, access control logic)
  • Any function where invariants, round-trip properties, or mathematical properties can be stated

Invocation:

Skill({ skill: 'property-based-testing' });

Key property patterns to identify:

PatternExample
Round-tripdecode(encode(x)) === x
Idempotencenormalize(normalize(x)) === normalize(x)
Invariantsort(arr).length === arr.length
Fail-closed (security)securityHook(anyInput).allow !== true (unless explicitly whitelisted)

PBT is a supplement to Canon TDD, not a replacement. Canon RED/GREEN/REFACTOR completes first; PBT runs after GREEN is confirmed.

Step 6: Repeat until backlog empty

AI-Assisted Guardrails

  • Use tests as executable prompt context; keep prompts short and test-focused.
  • Prefer deterministic tests (stable fixtures, no nondeterministic ordering).
  • Use bounded repair loops: max 3 repair attempts per scenario before redesign.
  • Run anti-test-hacking checks:
    • Verify changed assertions still express original requirement.
    • Add at least one negative test for bug-fix tasks.
  • Ensure code does not branch on test-only artifacts.

Memory Acceleration Layer

Use lightweight memory only to reduce repeated setup and triage:

  • preferred repo-local test/lint/format commands
  • recurring failure signatures and short fix summaries
  • recurring anti-pattern reminders
  • reusable scenario templates

Reference: references/tdd-memory-profile.md

Hard rules:

  • memory never bypasses RED proof
  • memory never changes Canon sequence
  • keep profile bounded and low-noise

Test-Driven Prompting (TDP) — 2026 Standard Pattern

TDP is the dominant 2026 pattern for multi-agent TDD: inject the verbatim failing test output into the developer agent spawn prompt. This eliminates interpretation errors — the developer sees exactly what the test runner sees.

Pattern

Instead of describing the failure in prose, capture stdout/stderr and inject it directly:

// Step 1: Run test and capture raw output
const { execSync } = require('child_process');
let testOutput = '';
try {
  execSync('node --test tests/hooks/routing-guard.test.cjs', { encoding: 'utf-8' });
} catch (e) {
  testOutput = e.stdout + e.stderr; // Verbatim failure output
}

// Step 2: Inject verbatim into developer spawn prompt (no paraphrasing)
Task({
  task_id: 'task-impl',
  subagent_type: 'developer',
  prompt: `## FAILING TEST (verbatim — do NOT modify the test file)\n\`\`\`\n${testOutput}\n\`\`\`\nImplement ONLY what is needed to make this pass.`,
});

Why TDP Works

  • Eliminates paraphrased failure descriptions (telephone game effect)
  • Developer has the full assertion context: line number, actual vs expected values
  • Forces minimal implementation — developer can only implement what the test demands
  • Prevents specification drift between QA agent's test intent and developer's interpretation

TDP + Multi-Agent TDD Decomposition

StepAgentAction
1qaWrite failing test, commit test-only, capture raw output
2RouterExtract test output, build TDP spawn prompt
3developerImplement to GREEN using verbatim test output as spec
4reflection-agentVerify no test assertions were modified (git diff check)

Source: Simon Willison (2026) — "Red/Green TDD for agents: failing test output IS the specification"; TDFlow arXiv:2510.23761.

Autonomous TDD with ralph-loop (Session-Persistent Iteration)

For repository-scale TDD where sessions may be interrupted, wire ralph-loop (Mode 2 — router-managed) to maintain the TDD scenario backlog across interruptions:

TDD State Schema

Maintain a TDD-specific state file at .claude/context/runtime/tdd-state.json:

{
  "scenarios": [
    {
      "id": "sc-001",
      "description": "routing-guard blocks Write on creator paths",
      "status": "pending"
    },
    { "id": "sc-002", "description": "spawn-token-guard warns at 80K tokens", "status": "green" }
  ],
  "completedScenarios": [
    {
      "id": "sc-002",
      "evidenceCommand": "node --test tests/hooks/spawn-token-guard.test.cjs",
      "passedAt": "2026-03-12T10:00:00Z"
    }
  ],
  "currentScenario": "sc-001",
  "evidenceLog": [
    {
      "scenarioId": "sc-001",
      "phase": "red",
      "output": "AssertionError: expected exit code 2, got 0",
      "timestamp": "..."
    }
  ]
}

Resume Pattern

At the start of each iteration, read the TDD state file:

// Step 0 — before building/refreshing backlog
const state = JSON.parse(
  fs.readFileSync('.claude/context/runtime/tdd-state.json', 'utf-8') || '{}'
);
const completedIds = (state.completedScenarios || []).map(s => s.id);
const remaining = (state.scenarios || []).filter(s => !completedIds.includes(s.id));
// Pick next scenario from remaining — never re-run completed ones

Integration with ralph-loop Mode 2

  1. Router spawns qa agent with { task_id, subagent_type: 'qa', prompt: TDP_PROMPT + verbatim state }
  2. qa writes test → runs → captures output → updates tdd-state.json (phase: red)
  3. Router spawns developer with TDP prompt (verbatim test output injected)
  4. developer i

Content truncated.

Search skills

Search the agent skills registry