code-analysis
Analyze unfamiliar codebases using a structured 5-phase method: entry points, dependencies, functions, data flow, and integration points.
Install
mkdir -p .claude/skills/code-analysis && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/15027" && unzip -o skill.zip -d .claude/skills/code-analysis && rm skill.zipInstalls to .claude/skills/code-analysis
Activation
This is the description your AI agent reads to decide when to run this skill — the better it matches your request, the more reliably it fires.
Analyze unfamiliar codebases using a structured 5-phase method: entry points, dependencies, functions, data flow, and integration points.About this skill
Code Analysis Skill
Purpose
Systematic methodology for understanding unfamiliar code bases and complex modules. Used by Code-Reader Agent.
When to Apply
- Analyzing A1111 postprocessing-for-training module
- Exploring waifuc, gradio, or other core dependencies
- Understanding existing dataset-cat components
- Reverse-engineering data flow in complex pipelines
5-Phase Analysis Process
Phase 1: Entry Point Discovery (10-15 min)
Goal: Find where code execution starts and understand high-level flow
- Locate main entry point (check
__main__.py,main(),if __name__ == '__main__') - Read function signatures at the top level
- Identify primary classes and their relationships
- Note major data structures and their purpose
- Create a 30,000-foot view mental model
Tools: grep_search (find entry points), read_file (top 50 lines)
Output: "This module's main entry is X, it does Y by calling Z components"
Phase 2: Dependency Mapping (10-20 min)
Goal: Understand external dependencies and their roles
- List all imports at top of file
- For each third-party import, note its purpose:
- image processing: pillow, opencv-python, etc.
- ML/models: torch, torchvision, basicsr, realesrgan
- data: waifuc, gallery-dl
- UI: gradio
- Identify which dependencies are critical vs. optional
- Check versions if available (requirements.txt, setup.py)
Tools: read_file (imports section), grep_search (import usage)
Output: "This uses X library for Y, and Z library for W"
Phase 3: Functional Decomposition (15-30 min)
Goal: Understand what each major component does
For each significant function/class:
- Read docstring (purpose, inputs, outputs)
- Scan implementation (max 100 lines scan initially)
- Note if it's:
- Data loading
- Data transformation
- Configuration handling
- UI interaction
- File I/O
- External API calls
- Document key parameters and return values
Tools: semantic_search (find related functions), read_file (function bodies)
Output: Bulleted list of "Function X: Does Y with inputs A, B and returns C"
Phase 4: Data Flow Tracing (20-40 min)
Goal: Understand how data moves through the system
- Start from input (file, user input, API call)
- Follow data through transformations:
- What format is it in at each stage?
- What operations change it?
- Where are branch points (different code paths)?
- End at output (file, return value, UI display)
- Note any state mutations or side effects
Tools: grep_search (variable tracking), read_file (transformation code), semantic_search (related operations)
Output: Diagram or text flow like "Input image → Resize → Crop → Save"
Phase 5: Integration Point Analysis (10-20 min)
Goal: Determine how to connect this code to dataset-cat
- What are expected inputs? (data types, formats, sizes)
- What are outputs? (data types, formats, where stored)
- Are there configuration/parameter interfaces?
- Are there any callbacks or event hooks?
- What's the error handling strategy?
- Are there any assumptions about the environment?
Tools: grep_search (look for config handlers, error handlers), read_file (parameter definitions)
Output: "Integration requires: X format input, Y format output, Z configuration options"
Documentation Template
Create a markdown file with:
# [Module Name] Analysis
## High-Level Purpose
[1-2 sentences about what it does]
## Entry Point(s)
- `function_name()` in file.py
## Key Components
- **Class/Function Name**: Brief purpose
- Inputs: Type and format
- Outputs: Type and format
- Key parameters: What they control
## Dependencies
- `library_name`: Purpose and critical operations
## Data Flow
[Text or ASCII diagram showing transformation pipeline]
## Integration Points
- Input format: [description]
- Output format: [description]
- Configuration: [parameters and defaults]
- Error handling: [strategy]
- Environment assumptions: [any special setup needed]
## Lessons for Implementation
- [Potential pitfalls]
- [Design patterns used]
- [Reusable components]
Common Pitfalls to Avoid
- Over-reading: Don't read every line. Skim, then deep-dive only on critical functions
- Missing dependencies: Always check what imports a function uses
- Assuming flow: Trace actual code paths, don't assume linear execution
- Ignoring config: Look for configuration systems that affect behavior
- Missing side effects: Watch for mutations, file I/O, or network calls
Tools Usage Tips
- grep_search: Find function definitions and usages quickly
- semantic_search: When you need "find all code that handles X"
- read_file: Study logic, not just browse; read with context (50+ line ranges)
- file_search: Locate files when you only know partial names
Time Budgets
- Small file/function: 10-15 minutes total
- Medium module: 30-45 minutes total
- Large codebase: 60-90 minutes (might need multiple session focus areas)
Memory Archiving
Always end analysis with stored findings:
- Save to
/memories/repo/code-analysis-[module-name].md - Include all sections from documentation template
- Add "Integration Recommendations" section for Code-Migrator
- Keep findings readable and scannable (short bullets, examples)