tdd

Name: tdd
Author: oimiragieo

Canon TDD for humans and AI agents. Use for production code changes by writing tests first, proving RED, implementing minimal GREEN, and refactoring safely. 2026 edition adds TDP, flakiness gate, ralph-loop integration, memory-search in Step 0, PBT as Step 5.5, mutation testing gate in Step 4, and C

Install

mkdir -p .claude/skills/tdd-oimiragieo && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/13322" && unzip -o skill.zip -d .claude/skills/tdd-oimiragieo && rm skill.zip

Installs to .claude/skills/tdd-oimiragieo

Activation

This is the description your AI agent reads to decide when to run this skill — the better it matches your request, the more reliably it fires.

Canon TDD for humans and AI agents. Use for production code changes by writing tests first, proving RED, implementing minimal GREEN, and refactoring safely. 2026 edition adds TDP, flakiness gate, ralph-loop integration, memory-search in Step 0, PBT as Step 5.5, mutation testing gate in Step 4, and CJS LSP warning.

315 chars✓ has a “when” triggerlonger than Claude Code's old 250-char listing cap (fine on current versions)

About this skill

Test-Driven Development (TDD)

Overview

This skill implements Canon TDD with AI-specific guardrails:

Build or update a scenario list.
Execute exactly one scenario as a runnable test.
Prove RED.
Implement minimum change for GREEN.
Optionally refactor.
Repeat until scenario list is empty.

When to Use

Use for:

New features
Bug fixes
Behavior changes
Repository-scale patching driven by tests
AI-assisted code generation where tests are executable specifications

Ask human approval before bypassing only for:

Throwaway prototypes
Purely declarative config edits with no execution path
One-off migration scripts that will not be maintained

The Iron Law

NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST

If code was written first, discard and restart from RED.

Canon Loop

Step 0: Create/refresh scenario backlog

Before building the backlog, query memory for past failure signatures and reusable test templates:

Skill({ skill: 'memory-search' }); // query: "<feature-name> test failure signatures"

Read .claude/context/memory/learnings.md for recurring anti-patterns relevant to this task.

Then:

Keep a short ordered list of test scenarios for this task.
Prioritize by design signal and risk, not by implementation convenience.
Add discovered scenarios during execution.
Reuse templates from memory — do not repeat failure patterns already documented.

Step 1: Pick exactly one scenario and write one runnable test

One behavior per cycle.
Use clear behavior names.
Favor real collaborators; mock only external boundaries.

Step 2: Prove RED

Run the narrowest test command.
Failure must be due to missing behavior, not syntax or setup errors.
Record red evidence (test file and failing assertion message).

Step 3: Implement minimum GREEN patch

Implement only what current red test requires.
No speculative APIs or unrelated cleanup.
Keep patch bounded to current scenario.

Step 4: Prove GREEN

Re-run narrow test command.
Run impacted suite (or package-level test set).
Confirm no regressions.

Flakiness Gate (mandatory for async, hook, or nondeterministic tests):

For tests that involve async I/O, stop hooks, timers, or file system operations, a single pass is insufficient. Require 3 consecutive passes before declaring GREEN:

# Run 3 times — all 3 must pass
node --test tests/hooks/routing-guard.test.cjs && \
node --test tests/hooks/routing-guard.test.cjs && \
node --test tests/hooks/routing-guard.test.cjs

A test that passes once and fails on the second run is RED, not GREEN. Do not advance to Step 5 until 3 consecutive passes are confirmed.

Mutation Testing Gate (security-critical code only):

For security hooks, routing validators, auth logic, and any code path that controls access or trust decisions, run Stryker mutation testing after achieving GREEN to verify that tests genuinely catch faults and are not vacuously passing.

# Run Stryker mutation testing (threshold: 85%)
npx stryker run
# Require mutationScore >= 85 in stryker.config.json

For fast-check-based property tests on security hooks, the fail-closed property is the mutation-equivalent gate:

// fast-check fail-closed property — must hold for any input
fc.assert(
  fc.property(fc.anything(), input => {
    const result = securityHook(input);
    // Hook must NEVER return allow=true for malformed/unexpected input
    expect(result.allow).not.toBe(true);
  })
);

Skip this gate for non-security application code (Step 4 → Step 5 directly).

Step 5: Optional refactor

Refactor only with green tests.
Re-run the same test set after refactor.

Step 5.5: Property-Based Testing (recommended for utility functions and security hooks)

After refactor (or after Step 4 for security-critical code), consider supplementing example-based tests with property-based tests. PBT achieves 23.1–37.3% pass@1 improvement over example-based TDD alone for LLM code generation (arXiv:2506.18315) by breaking the self-deception cycle.

When to invoke:

Utility functions (encode/decode, parsers, serializers, calculators)
Security hooks (input validators, sanitizers, access control logic)
Any function where invariants, round-trip properties, or mathematical properties can be stated

Invocation:

Skill({ skill: 'property-based-testing' });

Key property patterns to identify:

Pattern	Example
Round-trip	`decode(encode(x)) === x`
Idempotence	`normalize(normalize(x)) === normalize(x)`
Invariant	`sort(arr).length === arr.length`
Fail-closed (security)	`securityHook(anyInput).allow !== true` (unless explicitly whitelisted)

PBT is a supplement to Canon TDD, not a replacement. Canon RED/GREEN/REFACTOR completes first; PBT runs after GREEN is confirmed.

Step 6: Repeat until backlog empty

AI-Assisted Guardrails

Use tests as executable prompt context; keep prompts short and test-focused.
Prefer deterministic tests (stable fixtures, no nondeterministic ordering).
Use bounded repair loops: max 3 repair attempts per scenario before redesign.
Run anti-test-hacking checks:
- Verify changed assertions still express original requirement.
- Add at least one negative test for bug-fix tasks.
Ensure code does not branch on test-only artifacts.

Memory Acceleration Layer

Use lightweight memory only to reduce repeated setup and triage:

preferred repo-local test/lint/format commands
recurring failure signatures and short fix summaries
recurring anti-pattern reminders
reusable scenario templates

Reference: references/tdd-memory-profile.md

Hard rules:

memory never bypasses RED proof
memory never changes Canon sequence
keep profile bounded and low-noise

Test-Driven Prompting (TDP) — 2026 Standard Pattern

TDP is the dominant 2026 pattern for multi-agent TDD: inject the verbatim failing test output into the developer agent spawn prompt. This eliminates interpretation errors — the developer sees exactly what the test runner sees.

Pattern

Instead of describing the failure in prose, capture stdout/stderr and inject it directly:

// Step 1: Run test and capture raw output
const { execSync } = require('child_process');
let testOutput = '';
try {
  execSync('node --test tests/hooks/routing-guard.test.cjs', { encoding: 'utf-8' });
} catch (e) {
  testOutput = e.stdout + e.stderr; // Verbatim failure output
}

// Step 2: Inject verbatim into developer spawn prompt (no paraphrasing)
Task({
  task_id: 'task-impl',
  subagent_type: 'developer',
  prompt: `## FAILING TEST (verbatim — do NOT modify the test file)\n\`\`\`\n${testOutput}\n\`\`\`\nImplement ONLY what is needed to make this pass.`,
});

Why TDP Works

Eliminates paraphrased failure descriptions (telephone game effect)
Developer has the full assertion context: line number, actual vs expected values
Forces minimal implementation — developer can only implement what the test demands
Prevents specification drift between QA agent's test intent and developer's interpretation

TDP + Multi-Agent TDD Decomposition

Step	Agent	Action
1	`qa`	Write failing test, commit test-only, capture raw output
2	Router	Extract test output, build TDP spawn prompt
3	`developer`	Implement to GREEN using verbatim test output as spec
4	`reflection-agent`	Verify no test assertions were modified (git diff check)

Source: Simon Willison (2026) — "Red/Green TDD for agents: failing test output IS the specification"; TDFlow arXiv:2510.23761.

Autonomous TDD with ralph-loop (Session-Persistent Iteration)

For repository-scale TDD where sessions may be interrupted, wire ralph-loop (Mode 2 — router-managed) to maintain the TDD scenario backlog across interruptions:

TDD State Schema

Maintain a TDD-specific state file at .claude/context/runtime/tdd-state.json:

{
  "scenarios": [
    {
      "id": "sc-001",
      "description": "routing-guard blocks Write on creator paths",
      "status": "pending"
    },
    { "id": "sc-002", "description": "spawn-token-guard warns at 80K tokens", "status": "green" }
  ],
  "completedScenarios": [
    {
      "id": "sc-002",
      "evidenceCommand": "node --test tests/hooks/spawn-token-guard.test.cjs",
      "passedAt": "2026-03-12T10:00:00Z"
    }
  ],
  "currentScenario": "sc-001",
  "evidenceLog": [
    {
      "scenarioId": "sc-001",
      "phase": "red",
      "output": "AssertionError: expected exit code 2, got 0",
      "timestamp": "..."
    }
  ]
}

Resume Pattern

At the start of each iteration, read the TDD state file:

// Step 0 — before building/refreshing backlog
const state = JSON.parse(
  fs.readFileSync('.claude/context/runtime/tdd-state.json', 'utf-8') || '{}'
);
const completedIds = (state.completedScenarios || []).map(s => s.id);
const remaining = (state.scenarios || []).filter(s => !completedIds.includes(s.id));
// Pick next scenario from remaining — never re-run completed ones

Integration with ralph-loop Mode 2

Router spawns qa agent with { task_id, subagent_type: 'qa', prompt: TDP_PROMPT + verbatim state }
qa writes test → runs → captures output → updates tdd-state.json (phase: red)
Router spawns developer with TDP prompt (verbatim test output injected)
developer i

Content truncated.

More by oimiragieo

View all by oimiragieo →

user-research

oimiragieo

UX research methodology skill — usability testing protocols, user interview frameworks, persona development, journey mapping, heuristic evaluation (Nielsen's 10), A/B test analysis, accessibility auditing, and research synthesis using NNGroup methodology.

pipeline-reflection-ux

oimiragieo

Improve router-facing pipeline and reflection narration to reduce noisy status churn and make Step 0/Reflection outcomes explicit. Use when updating Router output contract, reflection reminder wording, or post-pipeline notification batching.

Install

mkdir -p .claude/skills/tdd-oimiragieo && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/13322" && unzip -o skill.zip -d .claude/skills/tdd-oimiragieo && rm skill.zip

Installs to .claude/skills/tdd-oimiragieo

Safety

Review before install

Runs shell / code
Bundles scripts

Automated static scan of the SKILL.md and repo. A flag describes what the skill can do — not a verdict. Always review code before installing.

Source & maintenance

Updated

2mo ago

Repo stars

Loads

~7,538 tokens

Stars are for the whole repository, not this skill alone.

Stats

Views

Installs

Author

oimiragieo

3 skills published

Links

Source code

tdd

Install

Activation

About this skill

Test-Driven Development (TDD)

Overview

When to Use

The Iron Law

Canon Loop

Step 0: Create/refresh scenario backlog

Step 1: Pick exactly one scenario and write one runnable test

Step 2: Prove RED

Step 3: Implement minimum GREEN patch

Step 4: Prove GREEN

Step 5: Optional refactor

Step 5.5: Property-Based Testing (recommended for utility functions and security hooks)

Step 6: Repeat until backlog empty

AI-Assisted Guardrails

Memory Acceleration Layer

Test-Driven Prompting (TDP) — 2026 Standard Pattern

Pattern

Why TDP Works

TDP + Multi-Agent TDD Decomposition

Autonomous TDD with ralph-loop (Session-Persistent Iteration)

TDD State Schema

Resume Pattern

Integration with ralph-loop Mode 2

More by oimiragieo

user-research

pipeline-reflection-ux

Search skills