database-design

Name: database-design
Author: Harmitx7

Database design principles and decision-making. Schema design, indexing strategy, ORM selection, serverless databases.

Install

mkdir -p .claude/skills/database-design-harmitx7 && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/13848" && unzip -o skill.zip -d .claude/skills/database-design-harmitx7 && rm skill.zip

Installs to .claude/skills/database-design-harmitx7

Activation

This is the description your AI agent reads to decide when to run this skill — the better it matches your request, the more reliably it fires.

Database design principles and decision-making. Schema design, indexing strategy, ORM selection, serverless databases.

118 charsno explicit “when” trigger

About this skill

Database Design Principles

A schema is cheap to design and expensive to migrate. Design it right for the queries your app actually runs.

Core Decision: What Database?

Before schema design, the database type must be justified — not assumed.

Need	Consider
Relational data with integrity constraints	PostgreSQL (default choice for most apps)
Horizontal write scaling, flexible schema	MongoDB, DynamoDB
Sub-millisecond reads, ephemeral/session data	Redis, Upstash
Full-text search as primary use case	Elasticsearch, Typesense
Serverless, zero-ops, edge-deployable	Turso, PlanetScale, Neon
Time-series events	InfluxDB, TimescaleDB
Semantic / vector similarity search	pgvector (in PostgreSQL), Qdrant, Pinecone

Default when uncertain: PostgreSQL. It handles relational, JSON, full-text, and time-series use cases well enough that you rarely need to deviate for most applications.

Vector Database Patterns

AI applications need semantic search — finding documents by meaning, not keyword. Vector databases store high-dimensional embeddings and search them by similarity.

pgvector — Stay in PostgreSQL

-- Enable extension once
CREATE EXTENSION IF NOT EXISTS vector;

-- Add embedding column to existing table
ALTER TABLE documents ADD COLUMN embedding vector(1536);  -- 1536 for text-embedding-3-small

-- IVFFlat index for approximate nearest neighbor search
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
-- lists = sqrt(num_rows) is a good starting point

-- Query: find 5 most semantically similar documents
SELECT id, content, 1 - (embedding <=> $1) AS similarity
FROM documents
ORDER BY embedding <=> $1  -- cosine distance operator
LIMIT 5;

Dedicated Vector DB: When pgvector Isn't Enough

Trigger to Upgrade	Recommended
> 1M vectors + sub-10ms p99	Qdrant (self-hosted, Rust) or Pinecone (managed)
Multimodal (text + images)	Weaviate
Managed, predictable pricing	Pinecone
Zero-ops prototype	ChromaDB (local)

Chunking + Storage Best Practice

// Always store both the raw text AND the embedding — embeddings are not reversible
await db.query(`
  INSERT INTO documents (content, source_url, chunk_index, embedding)
  VALUES ($1, $2, $3, $4)
`, [chunkText, sourceUrl, chunkIndex, JSON.stringify(embedding)]);
// embedding is float[] — serialize to JSON for parameterized query

Schema Design Rules

Model for queries, not for elegance

The most normalized schema is not always the right schema. Ask: what does the application actually read?

Design the schema to make the most frequent, performance-critical queries fast — even if that means some denormalization.

Naming Conventions

-- Tables: plural, snake_case
CREATE TABLE user_sessions (...);

-- Primary keys: always "id"
id UUID PRIMARY KEY DEFAULT gen_random_uuid();

-- Foreign keys: {referenced_table_singular}_id
user_id UUID REFERENCES users(id);

-- Timestamps: always include both
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW();

-- Booleans: is_ prefix
is_active BOOLEAN NOT NULL DEFAULT TRUE;

Required on Every Table

id          UUID PRIMARY KEY    -- or BIGSERIAL for high-insert tables
created_at  TIMESTAMPTZ         -- immutable creation time
updated_at  TIMESTAMPTZ         -- changes on every update (trigger or ORM)

Indexing Strategy

An index makes reads faster and writes slightly slower. Index on the columns you filter and sort — not every column.

Index when:

Column appears in WHERE clauses frequently
Column is used for JOIN conditions
Column is used in ORDER BY on large result sets
Column is a foreign key that will be queried by relationship

Don't index when:

Table has under a few thousand rows — full scan is faster than index lookup overhead
Column has very low cardinality (e.g., a boolean field with 95% TRUE)
Column is rarely queried

-- Composite index: order matters — most selective first
CREATE INDEX idx_orders_user_status ON orders(user_id, status);

-- Partial index: only index what you query
CREATE INDEX idx_active_users ON users(email) WHERE is_active = TRUE;

N+1 Queries

The most common ORM performance failure. N+1 happens when you fetch N records then make a separate query for each one.

// ❌ N+1 — 1 query for posts + N queries for authors
const posts = await Post.findAll();
for (const post of posts) {
  post.author = await User.findById(post.userId); // N queries
}

// ✅ Eager load — 2 queries total
const posts = await Post.findAll({ include: ['author'] });

Detection: Enable query logging in development. If you see repetitive queries differing only by ID, you have N+1.

Migration Rules

Every schema change is a migration — never modify the database directly in production
Migrations are additive first: add the column, deploy code that uses it, then remove the old column later
Never drop a column in the same migration that deploys the code removing its references
Test migrations on a production-size dataset — a 10-second migration on dev can lock a table for hours on prod

File Index

File	Covers	Load When
`schema-design.md`	Detailed schema patterns and relationship modeling	Designing or reviewing a schema
`indexing.md`	When and how to index, partial indexes, covering indexes	Performance investigation
`orm-selection.md`	Prisma vs Drizzle vs TypeORM vs raw SQL trade-offs	Choosing ORM
`migrations.md`	Safe migration patterns, rollback strategy	Changing existing schema
`optimization.md`	Query analysis, EXPLAIN output, common fixes	Slow query diagnosis
`database-selection.md`	Detailed database selection framework	Architecture decision

Scripts

Script	Purpose	Run With
`scripts/schema_validator.py`	Validates schema for missing indexes, naming issues	`python scripts/schema_validator.py <project_path>`

Output Format

When this skill produces a recommendation or design decision, structure your output as:

━━━ Database Design Recommendation ━━━━━━━━━━━━━━━━
Decision:    [what was chosen / proposed]
Rationale:   [why — one concise line]
Trade-offs:  [what is consciously accepted]
Next action: [concrete next step for the user]
─────────────────────────────────────────────────
Pre-Flight:  ✅ All checks passed
             or ❌ [blocking item that must be resolved first]

🏛️ Tribunal Integration (Anti-Hallucination)

Slash command: /tribunal-database Active reviewers: logic · security · sql

❌ Forbidden AI Tropes in Database Design

Blindly guessing column types — e.g., using VARCHAR(255) for everything instead of precise types or TEXT.
Missing updated_at triggers — defining updated_at without a mechanism to actually update it.
N+1 queries by default — returning code that queries relations in a loop.
Destructive migrations — dropping a column in the same migration that drops the code using it.
Over-indexing — adding indexes to every single column regardless of cardinality or query patterns.

✅ Pre-Flight Self-Audit

Review these questions before generating database schemas or queries:

✅ Did I design for the queries the application actually runs, rather than theoretical elegance?
✅ Are my suggested indexes selective and actually used in `WHERE` or `JOIN` clauses?
✅ Is this code safe from N+1 query performance problems?
✅ Did I rely on parameterized queries (no string concatenation)?
✅ Did I use the correct primary key strategy (e.g., UUID vs BIGSERIAL) for the scale?

🤖 LLM-Specific Traps

AI coding assistants often fall into specific bad habits when dealing with this domain. These are strictly forbidden:

Over-engineering: Proposing complex abstractions or distributed systems when a simpler approach suffices.
Hallucinated Libraries/Methods: Using non-existent methods or packages. Always // VERIFY or check package.json / requirements.txt.
Skipping Edge Cases: Writing the "happy path" and ignoring error handling, timeouts, or data validation.
Context Amnesia: Forgetting the user's constraints and offering generic advice instead of tailored solutions.
Silent Degradation: Catching and suppressing errors without logging or re-raising.

🏛️ Tribunal Integration (Anti-Hallucination)

Slash command: /review or /tribunal-full Active reviewers: logic-reviewer · security-auditor

❌ Forbidden AI Tropes

Blind Assumptions: Never make an assumption without documenting it clearly with // VERIFY: [reason].
Silent Degradation: Catching and suppressing errors without logging or handling.
Context Amnesia: Forgetting the user's constraints and offering generic advice instead of tailored solutions.

✅ Pre-Flight Self-Audit

Review these questions before confirming output:

✅ Did I rely ONLY on real, verified tools and methods?
✅ Is this solution appropriately scoped to the user's constraints?
✅ Did I handle potential failure modes and edge cases?
✅ Have I avoided generic boilerplate that doesn't add value?

🛑 Verification-Before-Completion (VBC) Protocol

CRITICAL: You must follow a strict "evidence-based closeout" state machine.

❌ Forbidden: Declaring a task complete because the output "looks correct."
✅ Required: You are explicitly forbidden from finalizing any task without providing concrete evidence (terminal output, passing tests, compile success, or equivalent proof) that your output works as intended.

More by Harmitx7

View all by Harmitx7 →

i18n-localization

Harmitx7

Internationalization (i18n) and localization mastery. Abstracting hardcoded strings, managing JSON/YAML translation dictionaries, bidirectional routing (RTL support for Arabic/Hebrew), Pluralization algorithms, date/currency formatting, and SSR locale detection in Next.js/React. Use when preparing a

code-review-checklist

Harmitx7

Code review guidelines covering code quality, security, and best practices.

performance-profiling

Harmitx7

Performance profiling mastery. Core Web Vitals (LCP, CLS, INP), Lighthouse auditing, JavaScript profiling, React rendering optimization, bundle analysis, memory leak detection, database query profiling (EXPLAIN ANALYZE), load testing, and performance budgets. Use when optimizing performance, debuggi

web-accessibility-auditor

Harmitx7

Web Accessibility (a11y) mastery. WCAG 2.2 AA standards, semantic HTML, ARIA attributes, keyboard navigation, focus management, screen reader compatibility, color contrast, and dynamic content announcements. Use when building UI components or auditing frontend code for accessibility compliance.

data-validation-schemas

Harmitx7

Data validation and schema design mastery. Zod, Yup, Joi, Valibot, and Pydantic schema design, runtime type checking, API boundary validation, form validation patterns, DTO design, schema composition, error message formatting, schema evolution strategies, and coercion rules. Use when validating user

plan-writing

Harmitx7

Technical design and implementation planning mastery. Writing structured execution checklists, dependency mapping, establishing rollback protocols, segmenting monolithic tasks, writing ADRs (Architecture Decision Records), and defining verification criteria. Use when transitioning from ideation to c

Install

mkdir -p .claude/skills/database-design-harmitx7 && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/13848" && unzip -o skill.zip -d .claude/skills/database-design-harmitx7 && rm skill.zip

Installs to .claude/skills/database-design-harmitx7

Safety

Review before install

Bundles scripts

Automated static scan of the SKILL.md and repo. A flag describes what the skill can do — not a verdict. Always review code before installing.

Source & maintenance

Updated

3mo ago

Repo stars

Loads

~2,456 tokens

Stars are for the whole repository, not this skill alone.

Stats

Views

Installs

Author

Harmitx7

7 skills published

Links

Source code

database-design

Install

Activation

About this skill

Database Design Principles

Core Decision: What Database?

Vector Database Patterns

pgvector — Stay in PostgreSQL

Dedicated Vector DB: When pgvector Isn't Enough

Chunking + Storage Best Practice

Schema Design Rules

Model for queries, not for elegance

Naming Conventions

Required on Every Table

Indexing Strategy

N+1 Queries

Migration Rules

File Index

Scripts

Output Format

🏛️ Tribunal Integration (Anti-Hallucination)

❌ Forbidden AI Tropes in Database Design

✅ Pre-Flight Self-Audit

🤖 LLM-Specific Traps

🏛️ Tribunal Integration (Anti-Hallucination)

❌ Forbidden AI Tropes

✅ Pre-Flight Self-Audit

🛑 Verification-Before-Completion (VBC) Protocol

More by Harmitx7

i18n-localization

code-review-checklist

performance-profiling

web-accessibility-auditor

data-validation-schemas

plan-writing

Search skills