notebooklm-knowledge-base-organizer

Name: notebooklm-knowledge-base-organizer
Author: agmangas

Install

mkdir -p .claude/skills/notebooklm-knowledge-base-organizer && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/16000" && unzip -o skill.zip -d .claude/skills/notebooklm-knowledge-base-organizer && rm skill.zip

Installs to .claude/skills/notebooklm-knowledge-base-organizer

Activation

This is the description your AI agent reads to decide when to run this skill — the better it matches your request, the more reliably it fires.

Use when preparing files for NotebookLM, organizing documents into a knowledge base, converting formats for NotebookLM compatibility, or reducing a large document collection to fit NotebookLM's 50-source limit. Scores and prioritizes sources, performs strategic merging (time-series, topic-based, format consolidation), converts unsupported formats (PPTX to PDF, XLSX to CSV), applies flat structure with descriptive snake_case names, and optimizes for RAG retrieval performance.

479 chars✓ has a “when” triggerlonger than Claude Code's old 250-char listing cap (fine on current versions)

About this skill

NotebookLM Knowledge Base Organizer

Prepares files for optimal use in NotebookLM by intelligently selecting and consolidating sources, converting formats, organizing structure, and ensuring compatibility. The primary constraint is NotebookLM's 50-source limit per notebook. When collections exceed this limit, systematic scoring, prioritization, and strategic merging reduce source count without losing valuable information.

When to Use This Skill

You have 50+ files and need to optimize for NotebookLM's limit
Preparing documents for a new NotebookLM notebook
Converting a messy folder into NotebookLM-ready sources
Files are in unsupported formats (PPTX, XLSX, complex PDFs)
Documents exceed 500k words or 200MB per file
Building a knowledge base for research, projects, or learning
Large document collections (100-300 files) need intelligent prioritization

What This Skill Does

Scores and Prioritizes Sources (when >50 detected) using Relevance, Recency, Uniqueness, and Information Density (0-40 scale)
Strategic Merging via time-series (daily to monthly), topic-based (related papers to comprehensive guides), and format consolidation (slides + transcript to unified PDF)
Converts to Supported Formats (PPTX to PDF, XLSX to CSV, scanned to OCR)
Applies Flat Structure with descriptive snake_case naming
Removes Duplicates across formats
Splits Large Files exceeding 500k words into parts
Optimizes for RAG with smaller, focused documents for better retrieval

NotebookLM Supported Formats

Supported:

PDF (text-selectable, not scanned images)
Google Docs, Sheets (<100k tokens), Slides (<100 slides)
Microsoft Word (.docx)
Text files (.txt, .md)
Images (PNG, JPEG, TIFF, WEBP)
Audio (MP3, WAV, AAC, OGG with clear speech)
URLs (websites, YouTube, Google Drive links)
Copy-pasted text

Convert These:

PPTX to PDF
XLSX to CSV or Google Sheets
Scanned PDFs to OCR text-selectable PDF
Large Sheets to CSV (<100k tokens)

File Limits

Per Source:

500,000 words max
200MB file size max
No page limit (word limit matters)

Per Notebook (Free):

50 sources maximum -- HARD LIMIT
100 notebooks total

Prefer many smaller, focused documents over few large ones for better RAG retrieval. The 50-source limit is the primary optimization constraint.

IMPORTANT: Preserve original file timestamps during all operations. Timestamps are essential for understanding latest additions, recent meeting minutes, and key decisions. Use touch -r original converted after conversions. Include dates in ISO format (YYYY-MM-DD) in all filenames.

How to Use

Prepare these files for NotebookLM - convert formats and organize with descriptive names

Convert all PPTX and XLSX files to NotebookLM-compatible formats

Check if any files exceed NotebookLM's 500k word or 200MB limits

Organize this research folder for a NotebookLM knowledge base

Find duplicate content across different file formats

Split this large PDF into NotebookLM-compatible chunks

Instructions

When a user requests NotebookLM organization, follow these steps.

Step 1: Assess and Prioritize Sources

Count and evaluate before proceeding with any organization.

total_sources=$(find . -type f \( -name "*.pdf" -o -name "*.docx" -o -name "*.txt" -o -name "*.md" -o -name "*.csv" \) | wc -l)
echo "Total sources found: $total_sources"

If total exceeds 50:

Score all sources using the 4-dimension rubric (Relevance, Recency, Uniqueness, Density, each 0-10). See references/scoring-system.md for the full rubric, assessment commands, and batch scoring script.
Rank and select top candidates using the decision matrix. Target 35-40 auto-keep sources initially. See references/prioritization-strategy.md for the selection process and space-based adjustments.

Identify merge candidates -- find time-series patterns, topic clusters, and multi-format duplicates:

# Time-series opportunities
find . -name "*_20[0-9][0-9]_[0-9][0-9]_*" | \
  sed 's/_20[0-9][0-9]_[0-9][0-9]_[0-9][0-9]//' | sort | uniq -c | sort -rn

# Topic clusters
find . -type f -name "*.pdf" | xargs -I {} basename {} .pdf | \
  sed 's/_part_[0-9]*//;s/_[0-9][0-9]*$//' | sort | uniq -c | sort -rn | awk '$1 > 2'

Execute strategic merges using appropriate patterns. See references/merging-strategies.md for time-series, topic-based, and format consolidation scripts. Preserve timestamps on all merged outputs.
Recount and validate the final total is at or below 50 (ideally 48 to reserve slots for future additions).

Step 2: Understand the Scope

Ask clarifying questions:

What is the topic/purpose of this knowledge base?
Which directory contains the source materials?
Target: single notebook or multiple related notebooks?
Any files that must stay in original format?
Is this for research, learning, project documentation, or reference?

Step 3: Analyze Current State

Review files for NotebookLM compatibility:

find . -type f -exec file {} \;
find . -type f -exec du -h {} \; | sort -rh
find . -type f | sed 's/.*\.//' | sort | uniq -c | sort -rn
for f in *.pdf; do pdftotext "$f" - | wc -w; done

Categorize findings:

Compatible as-is: PDF, DOCX, TXT, MD, images
Needs conversion: PPTX, XLSX, XLS, PPT, scanned PDFs
Too large: Files >500k words or >200MB
Duplicates: Same content in different formats
Merge candidates: Sources identified for consolidation in Step 1

Step 4: Convert Unsupported Formats

PowerPoint to PDF:

soffice --headless --convert-to pdf *.pptx
touch -r original.pptx converted.pdf  # Preserve timestamp

Excel to CSV:

soffice --headless --convert-to csv:"Text - txt - csv (StarCalc)":44,34,UTF8 *.xlsx
touch -r original.xlsx converted.csv  # Preserve timestamp

Scanned PDF to Searchable:

ocrmypdf input.pdf output_searchable.pdf
touch -r input.pdf output_searchable.pdf  # Preserve timestamp
pdftotext output_searchable.pdf - | wc -w  # Verify text extraction

WARNING: Always run touch -r original converted after every conversion to preserve the original file timestamp.

Step 5: Apply Naming

Use this pattern: category_topic_descriptor_YYYY_MM_DD.ext

Examples:

research_quantum_computing_basics_2025.pdf
meeting_notes_project_kickoff_2026_01_15.txt
client_proposal_acme_corp_final.docx
reference_api_documentation_v2.md
data_sales_figures_q4_2025.csv

See references/organization-scripts.md for the automated naming script. Preserve timestamps when renaming: use mv (preserves by default) and verify with stat.

Step 6: Split Large Documents

For files >500k words or >200MB:

pdftotext document.pdf - | wc -w  # Check word count
pdftk large.pdf cat 1-500 output large_part_1.pdf
pdftk large.pdf cat 501-1000 output large_part_2.pdf
touch -r large.pdf large_part_1.pdf large_part_2.pdf  # Preserve timestamps

Name parts by content, not arbitrary numbers:

annual_report_2025_part_1_executive_summary.pdf
annual_report_2025_part_2_financials.pdf
annual_report_2025_part_3_appendices.pdf

Step 7: Consolidation Pass

Perform strategic merging to optimize source count. This step is critical when merge candidates were identified in Step 1 or the collection is near the 50-source limit.

Merging is a primary optimization strategy, not a last resort. Three patterns apply:

Time-series: Combine chronological documents into period summaries (daily to monthly, weekly to quarterly)
Topic-based: Combine related papers/docs into comprehensive guides with chapter markers
Format consolidation: Combine slides + transcript + notes for the same event into a single PDF

See references/merging-strategies.md for full merge patterns, scripts (time-series merger, topic-based PDF merger), decision trees, and quality checks.

IMPORTANT: Preserve chronological timestamps in merged content. Add clear date headers within merged files so temporal context is not lost.

Log all merge decisions for inclusion in the organization plan.

Step 8: Implement Flat Structure

NotebookLM works best with flat source lists, no nested folders.

Before:

docs/
  project/
    planning/
      requirements.pdf
    research/
      background.pdf
  reference/
    api_docs.pdf

After:

notebooklm_sources/
  project_requirements_2026.pdf
  project_background_research.pdf
  reference_api_documentation.pdf

See references/organization-scripts.md for the implementation script. Preserve timestamps when copying: use cp -p to maintain original dates.

Step 9: Find and Remove Duplicates

find . -type f -exec md5 {} \; | sort | uniq -d
find . -type f -printf '%f\n' | sed 's/\.[^.]*$//' | sort | uniq -d
for pdf in *.pdf; do echo "=== $pdf ==="; pdftotext "$pdf" - | md5; done | sort

Decision matrix:

Same content, different formats: keep PDF (best for NotebookLM)
Same content, different names: keep most descriptive name
Slight variations: merge into single document if <500k words
Truly duplicate: delete older version (check timestamps first)

Step 10: Optimize for RAG

NotebookLM uses RAG, which works best with focused documents:

Split 100-page documents into 3-5 topic-focused files
Separate chapters/sections into individual sources
Keep each source focused on one topic/subtopic
Prefer 20-50 pages per PDF over 200+ page megadocs

Instead of:
  company_handbook_500_pages.pdf

Create:
  handbook_code_of_conduct.pdf
  handbook_benefits_overview.pdf
  handbook_time_off_policy.pdf
  handbook_remote_work_guidelines.pdf
  handbook_career_development.pdf

Step 11: Propose Organization Plan

Present a plan to the user before making changes. The plan should cover current state, source selection strategy (if >50

Content truncated.

Install

mkdir -p .claude/skills/notebooklm-knowledge-base-organizer && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/16000" && unzip -o skill.zip -d .claude/skills/notebooklm-knowledge-base-organizer && rm skill.zip

Installs to .claude/skills/notebooklm-knowledge-base-organizer

Safety

Review before install

Runs shell / code

Automated static scan of the SKILL.md and repo. A flag describes what the skill can do — not a verdict. Always review code before installing.

Source & maintenance

Updated

4mo ago

Repo stars

Loads

~4,224 tokens

Stars are for the whole repository, not this skill alone.

Stats

Views

Installs

Author

agmangas

Links

Source code

notebooklm-knowledge-base-organizer

Install

Activation

About this skill

NotebookLM Knowledge Base Organizer

When to Use This Skill

What This Skill Does

NotebookLM Supported Formats

File Limits

How to Use

Instructions

Step 1: Assess and Prioritize Sources

Step 2: Understand the Scope

Step 3: Analyze Current State

Step 4: Convert Unsupported Formats

Step 5: Apply Naming

Step 6: Split Large Documents

Step 7: Consolidation Pass

Step 8: Implement Flat Structure

Step 9: Find and Remove Duplicates

Step 10: Optimize for RAG

Step 11: Propose Organization Plan

Search skills