agentskills.codes
VI

video-analyzer

>

Install

mkdir -p .claude/skills/video-analyzer && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/14822" && unzip -o skill.zip -d .claude/skills/video-analyzer && rm skill.zip

Installs to .claude/skills/video-analyzer

Activation

This is the description your AI agent reads to decide when to run this skill — the better it matches your request, the more reliably it fires.

Use this skill when analyzing existing video files using FFmpeg and AI vision, extracting frames for design system generation, detecting scene boundaries, analyzing animation timing, extracting color palettes, or understanding audio-visual sync. Triggers on video analysis, frame extraction, scene detection, ffprobe, motion analysis, and AI vision analysis of video content.
375 chars✓ has a “when” triggerlonger than Claude Code's old 250-char listing cap (fine on current versions)

About this skill

When this skill is activated, always start your first response with the :mag: emoji.

Video Analyzer

Video analysis is the practice of extracting structured information from video files - metadata, keyframes, scene boundaries, color palettes, motion data, and audio characteristics. A well-built video analysis pipeline combines FFmpeg for frame extraction and signal processing with AI vision models for semantic understanding of visual content. This skill covers the full workflow from raw video files to actionable data: using ffprobe for metadata inspection, FFmpeg filter graphs for frame extraction and scene detection, audio analysis for silence and volume detection, and AI vision for design system extraction and content understanding.

The two pillars of video analysis are FFmpeg (the Swiss Army knife of media processing) and AI vision models (for understanding what is in each frame). FFmpeg handles the mechanical work - splitting video into frames, detecting scene changes via pixel difference thresholds, extracting audio waveforms. AI vision handles the semantic work - identifying UI components, reading text, extracting color values, and understanding layout patterns.


When to use this skill

Trigger this skill when the user:

  • Wants to extract frames from a video at regular intervals or scene boundaries
  • Needs to analyze video metadata (resolution, duration, codecs, bitrate)
  • Asks about scene detection or scene change timestamps
  • Wants to extract a color palette or design system from video content
  • Needs to analyze audio tracks (silence detection, volume levels, waveforms)
  • Asks about motion analysis or animation timing from video
  • Wants to use AI vision to understand video content frame by frame
  • Needs to generate thumbnails or preview strips from video files

Do NOT trigger this skill for:

  • Creating or editing videos from scratch - use remotion-video or video-creator
  • Writing video scripts or storyboards - use video-scriptwriting
  • Live video streaming or real-time video processing
  • Video encoding/transcoding for distribution (that is a rendering task, not analysis)

Key principles

  1. Extract then analyze - Always separate frame extraction (FFmpeg) from semantic analysis (AI vision). Trying to do both in one step leads to brittle pipelines. Extract frames to disk first, then analyze them.

  2. Use ffprobe before ffmpeg - Before processing any video, inspect it with ffprobe to understand its properties. Blindly running FFmpeg commands on unknown formats leads to silent failures and corrupted output.

  3. Scene detection over fixed intervals - When analyzing video content, extract frames at scene boundaries rather than fixed time intervals. Scene change frames capture the visual diversity of the video with far fewer frames than one-per-second extraction.

  4. JSON output everywhere - Use ffprobe's JSON output format and structure your analysis results as JSON. This makes pipelines composable and results machine-readable.

  5. Disk space awareness - Video frame extraction can generate thousands of large image files. Always estimate output size before extracting, use appropriate image formats (JPEG for analysis, PNG for pixel-perfect work), and clean up temporary frames after analysis.


Core concepts

FFmpeg pipeline architecture

FFmpeg processes video through a pipeline of demuxing, decoding, filtering, encoding, and muxing. For analysis, we primarily use the decode and filter stages:

Input file -> Demuxer -> Decoder -> Filter graph -> Output (frames/data)

Key filter concepts for analysis:

  • select filter: choose which frames to output based on expressions
  • showinfo filter: print frame metadata (timestamps, picture type, etc.)
  • scene detection: pixel-level difference score between consecutive frames
  • fps filter: reduce frame rate to extract at regular intervals

Scene detection

Scene detection works by comparing consecutive frames using pixel difference. FFmpeg's scene filter produces a score from 0.0 (identical) to 1.0 (completely different). A threshold of 0.3-0.4 catches major scene changes while ignoring camera motion and lighting shifts.

ThresholdBehavior
0.1-0.2Very sensitive - catches pans, zooms, lighting changes
0.3-0.4Balanced - catches cuts, transitions, major changes
0.5-0.7Conservative - only hard cuts and dramatic scene changes
0.8-1.0Too aggressive - misses most scene changes

AI vision analysis workflow

The workflow for extracting structured data from video using AI vision:

  1. Probe - Get video metadata with ffprobe (duration, resolution, fps)
  2. Extract - Pull key frames at scene boundaries using FFmpeg
  3. Read - Load each frame image using the Read tool (supports images)
  4. Analyze - For each frame, identify colors, typography, layout, components
  5. Aggregate - Find consistent patterns across frames
  6. Output - Produce structured design system or content analysis

Common tasks

1. Install and verify FFmpeg

Check if FFmpeg is available and inspect its version and capabilities.

# Check FFmpeg installation
ffmpeg -version

# Check ffprobe installation
ffprobe -version

# Install on macOS
brew install ffmpeg

# Install on Ubuntu/Debian
sudo apt-get update && sudo apt-get install -y ffmpeg

# Verify supported formats
ffmpeg -formats 2>/dev/null | head -20

# Verify supported codecs
ffmpeg -codecs 2>/dev/null | grep -i h264

2. Extract key frames at scene boundaries

Extract only the frames where significant visual changes occur. This is the most efficient way to sample video content.

# Extract frames at scene changes (threshold 0.3)
mkdir -p scenes
ffmpeg -i input.mp4 \
  -vf "select='gt(scene,0.3)',showinfo" \
  -vsync vfr \
  scenes/scene_%04d.png \
  2>&1 | grep showinfo

# Extract with timestamps logged to a file
ffmpeg -i input.mp4 \
  -vf "select='gt(scene,0.3)',showinfo" \
  -vsync vfr \
  scenes/scene_%04d.png \
  2>&1 | grep "pts_time" > scenes/timestamps.txt

# Extract scene frames as JPEG (smaller files, good for analysis)
mkdir -p scenes
ffmpeg -i input.mp4 \
  -vf "select='gt(scene,0.3)'" \
  -vsync vfr \
  -q:v 2 \
  scenes/scene_%04d.jpg

3. Extract frames at regular intervals

When you need evenly spaced samples regardless of content changes.

# Extract one frame per second
mkdir -p frames
ffmpeg -i input.mp4 -vf "fps=1" frames/frame_%04d.png

# Extract one frame every 5 seconds
mkdir -p frames
ffmpeg -i input.mp4 -vf "fps=1/5" frames/frame_%04d.png

# Extract only I-frames (keyframes from the codec)
mkdir -p keyframes
ffmpeg -i input.mp4 \
  -vf "select='eq(pict_type,I)'" \
  -vsync vfr \
  keyframes/kf_%04d.png

# Extract a single frame at a specific timestamp
ffmpeg -i input.mp4 -ss 00:01:30 -frames:v 1 thumbnail.png

# Extract first frame only
ffmpeg -i input.mp4 -frames:v 1 first_frame.png

4. Analyze video metadata with ffprobe

Inspect video properties before processing. Always use JSON output for machine-readable results.

# Full metadata as JSON (streams and format)
ffprobe -v quiet \
  -print_format json \
  -show_format \
  -show_streams \
  input.mp4

# Get duration only
ffprobe -v error \
  -show_entries format=duration \
  -of default=noprint_wrappers=1:nokey=1 \
  input.mp4

# Get resolution
ffprobe -v error \
  -select_streams v:0 \
  -show_entries stream=width,height \
  -of csv=s=x:p=0 \
  input.mp4

# Get frame rate
ffprobe -v error \
  -select_streams v:0 \
  -show_entries stream=r_frame_rate \
  -of default=noprint_wrappers=1:nokey=1 \
  input.mp4

# Get codec information
ffprobe -v error \
  -select_streams v:0 \
  -show_entries stream=codec_name,codec_long_name,profile \
  -of json \
  input.mp4

# Count total frames
ffprobe -v error \
  -count_frames \
  -select_streams v:0 \
  -show_entries stream=nb_read_frames \
  -of default=noprint_wrappers=1:nokey=1 \
  input.mp4

5. Detect scenes and list timestamps

Get a list of scene change timestamps without extracting frames.

# List scene change timestamps
ffmpeg -i input.mp4 \
  -vf "select='gt(scene,0.3)',showinfo" \
  -f null - \
  2>&1 | grep pts_time

# Extract scene scores for every frame (for analysis)
ffmpeg -i input.mp4 \
  -vf "select='gte(scene,0)',metadata=print" \
  -f null - \
  2>&1 | grep "lavfi.scene_score"

# Count number of scene changes
ffmpeg -i input.mp4 \
  -vf "select='gt(scene,0.3)',showinfo" \
  -f null - \
  2>&1 | grep -c "pts_time"

6. Extract audio waveform and detect silence

Analyze the audio track for silence gaps, volume levels, and visual waveforms.

# Detect silence periods (useful for finding chapter breaks)
ffmpeg -i input.mp4 \
  -af silencedetect=noise=-30dB:d=0.5 \
  -f null - \
  2>&1 | grep silence

# Generate audio waveform as image
ffmpeg -i input.mp4 \
  -filter_complex "showwavespic=s=1920x200:colors=blue" \
  -frames:v 1 \
  waveform.png

# Analyze volume levels
ffmpeg -i input.mp4 \
  -af volumedetect \
  -f null - \
  2>&1 | grep volume

# Extract audio spectrum visualization
ffmpeg -i input.mp4 \
  -filter_complex "showspectrumpic=s=1920x512:color=intensity" \
  -frames:v 1 \
  spectrum.png

7. AI vision analysis workflow

Extract frames then analyze them with Claude's vision capability to extract structured information from video content.

# Step 1: Probe the video
ffprobe -v quiet -print_format json -show_format -show_streams input.mp4

# Step 2: Extract scene frames
mkdir -p analysis_frames
ffmpeg -i input.mp4 \
  -vf "select='gt(scene,0.3)'" \
  -vsync vfr \
  -q:v 2 \
  analysis_frames/frame_%04d.jpg

After extracting frames, use the Read tool to load each image. The Read tool supports image files (PNG, JPG, etc.) and will present them visually. For each frame, analyze:

  • Colors: Extract dominant hex color values, backg

Content truncated.

Search skills

Search the agent skills registry