test-audit

Name: test-audit
Author: T-rav

Install

mkdir -p .claude/skills/test-audit && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/14843" && unzip -o skill.zip -d .claude/skills/test-audit && rm skill.zip

Installs to .claude/skills/test-audit

Activation

This is the description your AI agent reads to decide when to run this skill — the better it matches your request, the more reliably it fires.

Audits test code for quality, structure, and patterns. Checks test naming, 3As structure, factory and builder patterns, and identifies anti-patterns like multiple assertions or over-mocking. Based on real codebase with 63 factories and 12 builders.

248 charsno explicit “when” trigger

About this skill

Test Audit Agent

Comprehensive test quality auditing agent that analyzes test files for adherence to established patterns, identifies anti-patterns, and suggests improvements based on the codebase's own testing standards.

Agent Purpose

Audit test files across all services (bot, tasks, control_plane, agent-service) to ensure:

Naming conventions - Files and tests follow established patterns
3As structure - Arrange, Act, Assert clarity
Single responsibility - One logical assertion per test
Builders & Factories - Proper use and identification of missing opportunities
Anti-pattern detection - Multiple assertions, over-mocking, incomplete setups

Established Standards (From Codebase Analysis)

1. Factory Pattern

Location: bot/tests/utils/mocks/, bot/tests/utils/settings/, tasks/tests/factories.py

Pattern:

class LLMProviderMockFactory:
    """Factory for creating mock LLM provider objects"""

    @staticmethod
    def create_mock_provider(response: str = "Test response") -> MagicMock:
        """Create basic mock LLM provider"""
        provider = MagicMock()
        provider.get_response = AsyncMock(return_value=response)
        provider.model = "test-model"
        return provider

    @staticmethod
    def create_provider_with_error(error: Exception) -> MagicMock:
        """Create LLM provider that raises errors"""
        provider = LLMProviderMockFactory.create_mock_provider()
        provider.get_response = AsyncMock(side_effect=error)
        return provider

Standards:

Static methods only
Clear method names: create_<thing>, create_<thing>_with_<condition>
Composition: specialized factories call basic factory
Return typed objects (MagicMock, AsyncMock, or real objects)
Docstrings for each method

2. Builder Pattern

Location: bot/tests/utils/builders/

Pattern:

class MessageBuilder:
    """Builder for creating message objects with fluent API"""

    def __init__(self):
        self._role = "user"
        self._content = "Test message"

    def as_user(self):
        """Set role as user"""
        self._role = "user"
        return self

    def as_assistant(self):
        """Set role as assistant"""
        self._role = "assistant"
        return self

    def with_content(self, content: str):
        """Set message content"""
        self._content = content
        return self

    def build(self) -> dict[str, Any]:
        """Build the message"""
        return {"role": self._role, "content": self._content}

Standards:

Instance methods that return self for chaining
as_<state>() methods for role/type
with_<attribute>() methods for properties
Final build() method returns the object
Can include class methods for common scenarios: @classmethod def user_message(cls, content: str)

3. Test Naming Convention

Pattern: test_<method/feature>_<scenario>[_<expected_result>]

Examples from codebase:

✅ test_send_message_success - Method + Scenario
✅ test_send_message_with_blocks - Method + Condition
✅ test_conversation_id_with_thread - Feature + Condition
✅ test_invoke_agent_404_not_found - Method + HTTP status
❌ test_1 - Meaningless
❌ test_user_test - Redundant "test"

4. 3As Structure

Good Example from codebase:

@pytest.mark.asyncio
async def test_send_message_success(self):
    """Test successful message sending"""
    # Arrange
    channel = "C12345"
    text = "Hello, World!"
    thread_ts = "1234567890.123456"
    expected_response = {"ts": "1234567890.654321"}

    slack_service = SlackServiceFactory.create_service_with_successful_client(
        expected_response
    )

    # Act
    result = await slack_service.send_message(channel, text, thread_ts)

    # Assert
    assert result == expected_response
    slack_service.client.chat_postMessage.assert_called_once_with(
        channel=channel, text=text, thread_ts=thread_ts, blocks=None, mrkdwn=True
    )

Standards:

Clear separation of setup, execution, verification
Comments (# Arrange, # Act, # Assert) optional but helpful
All arrange code before act
All assertions after act
No mixing of phases

5. Single Responsibility

Good:

def test_user_creation_sets_name():
    user = create_user({"name": "John"})
    assert user.name == "John"

def test_user_creation_sets_email():
    user = create_user({"email": "[email protected]"})
    assert user.email == "[email protected]"

Bad:

def test_user_creation():
    user = create_user({"name": "John", "email": "[email protected]"})
    assert user.name == "John"
    assert user.email == "[email protected]"
    assert user.is_active is True  # Testing 3 different things
    assert user.created_at is not None

6. Factory + Fixture Pattern

From conftest.py:

# Factory class (reusable logic)
class SettingsMockFactory:
    @staticmethod
    def create_basic_settings() -> MagicMock:
        settings = MagicMock()
        settings.slack.bot_token = "test-token"
        return settings

# Fixture (pytest integration)
@pytest.fixture
def mock_settings():
    """Mock application settings with test values."""
    return SettingsMockFactory.create_basic_settings()

Standards:

Factories can be used independently or in fixtures
Fixtures provide pytest integration
Tests can call factory directly if they need customization
Session-scoped fixtures for expensive setup

7. Test Coverage Analysis (NEW)

Critical - Untested Code Paths:

# ❌ CRITICAL - Critical function with no tests
# File: bot/services/payment_processor.py
def process_payment(user_id: int, amount: float, card_token: str) -> dict:
    """Process payment for user."""
    charge = stripe.Charge.create(
        amount=int(amount * 100),
        currency="usd",
        source=card_token
    )

    user = User.query.get(user_id)
    user.balance += amount
    user.save()

    send_receipt_email(user.email, charge.id)
    return {"status": "success", "charge_id": charge.id}

# No tests found for this function!
# Missing tests for:
# - Happy path (successful payment)
# - Error cases (invalid card, insufficient funds, stripe API error)
# - Edge cases (negative amount, non-existent user, amount = 0)
# - Side effects (user balance updated, email sent)

How to detect:

# Generate coverage report
pytest --cov=bot --cov-report=term-missing

# Find functions/files with <70% coverage
pytest --cov=bot --cov-report=html
# Open htmlcov/index.html and find files in red

# Analyze specific module
pytest --cov=bot.services.payment_processor --cov-report=term-missing

High - Missing Edge Case Tests:

# ✅ GOOD - Has happy path test
def test_divide_positive_numbers():
    assert divide(10, 2) == 5

# ❌ MISSING - No edge case tests
# Missing tests for:
# - divide(10, 0) → Should raise ValueError
# - divide(0, 10) → Should return 0
# - divide(-10, 2) → Should return -5
# - divide(10, -2) → Should return -5
# - divide(1, 3) → Should handle float precision

Medium - Untested Error Paths:

# Production code
def get_user(user_id: int) -> User:
    user = User.query.get(user_id)
    if not user:
        raise UserNotFoundError(f"User {user_id} not found")
    return user

# ✅ Test exists for happy path
def test_get_user_success():
    user = get_user(123)
    assert user.id == 123

# ❌ MISSING - No error path test
# Missing test:
def test_get_user_not_found():
    with pytest.raises(UserNotFoundError, match="User 999 not found"):
        get_user(999)

Detection Patterns:

# CRITICAL severity
- Functions in critical modules (payment, auth, security) with 0% coverage
- Error handling code (try/except blocks) never executed in tests
- Database operations (create, update, delete) with no tests

# HIGH severity
- Functions with <50% coverage
- Missing tests for all error paths
- No tests for input validation

# MEDIUM severity
- Functions with 50-70% coverage
- Missing tests for edge cases (null, empty, boundary values)
- Integration paths not covered

# Detection Commands:
pytest --cov=bot --cov-report=term-missing | grep "0%"
grep -r "def " --include="*.py" bot/services/ | wc -l  # Count functions
grep -r "def test_" --include="*.py" bot/tests/ | wc -l  # Count tests
# Compare ratio: should be at least 3 tests per function

8. Test Performance & Reliability (NEW)

Critical - Flaky Tests:

# ❌ CRITICAL - Time-dependent test (flaky)
def test_async_processing():
    start_processing(task_id=123)
    time.sleep(0.5)  # Flaky! Sometimes 0.5s isn't enough
    result = get_result(task_id=123)
    assert result.status == "complete"

# ✅ GOOD - Wait with timeout
import pytest

@pytest.mark.asyncio
async def test_async_processing():
    await start_processing(task_id=123)

    # Poll with timeout
    for _ in range(10):  # Max 5 seconds
        result = await get_result(task_id=123)
        if result.status == "complete":
            break
        await asyncio.sleep(0.5)
    else:
        pytest.fail("Processing did not complete in 5 seconds")

    assert result.status == "complete"

# ✅ BETTER - Use async/await properly
@pytest.mark.asyncio
async def test_async_processing():
    result = await start_processing(task_id=123)
    # Awaits completion, no sleep needed
    assert result.status == "complete"

# ❌ CRITICAL - Random behavior (non-deterministic)
def test_random_selection():
    items = [1, 2, 3, 4, 5]
    selected = random.choice(items)
    assert selected > 3  # Fails 40% of the time!

# ✅ GOOD - Seed random or mock it
def test_random_selection():
    items = [1, 2, 3, 4, 5]
    random.seed(42)  # Deterministic
    selected = random.choice(items)
    assert selected == 5  # Always same result

# ✅ BETTER - Mock random
from unittest.mock import patch

def test_random_selection():
    items = [1, 2, 3, 4, 5]


---

*Content truncated.*

More by T-rav

View all by T-rav →

code-quality-enforcer

T-rav

hf.warn-new-file-creation

T-rav

hf.warn-new-file-creation

hf.track-reindex-needed

T-rav

hf.track-reindex-needed

Install

mkdir -p .claude/skills/test-audit && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/14843" && unzip -o skill.zip -d .claude/skills/test-audit && rm skill.zip

Installs to .claude/skills/test-audit

Safety

Review before install

Runs shell / code

Automated static scan of the SKILL.md and repo. A flag describes what the skill can do — not a verdict. Always review code before installing.

Source & maintenance

Updated

3mo ago

License

Apache-2.0

Repo stars

Loads

~8,130 tokens

Stars are for the whole repository, not this skill alone.

Stats

Views

Installs

Author

T-rav

4 skills published

Links

Source code

test-audit

Install

Activation

About this skill

Test Audit Agent

Agent Purpose

Established Standards (From Codebase Analysis)

1. Factory Pattern

2. Builder Pattern

3. Test Naming Convention

4. 3As Structure

5. Single Responsibility

6. Factory + Fixture Pattern

7. Test Coverage Analysis (NEW)

8. Test Performance & Reliability (NEW)

More by T-rav

code-quality-enforcer

hf.warn-new-file-creation

hf.track-reindex-needed

Search skills