agentskills.codes

WE

wechat-batch-crawl

by JourneytoNewlandSource

| Intent | Supported Phrases | |--------|-------------------| | 爬取今天 | "爬取今天的微信文章" / "获取今天的文章" / "抓今天的公众号" | | 爬取昨天 | "爬取昨天的微信文章" / "获取昨天的文章" | | 爬取指定日期 | "爬取1月20号的文章" / "获取上周一的文章" | | 仅列出 | "今天有哪些文章" / "列出今天的文章" / "看看有啥新文章" | | 增量爬取 | "继续爬取" / "爬取新增的文章" |

Install

mkdir -p .claude/skills/wechat-batch-crawl && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/14736" && unzip -o skill.zip -d .claude/skills/wechat-batch-crawl && rm skill.zip

Installs to .claude/skills/wechat-batch-crawl

Activation

This is the description your AI agent reads to decide when to run this skill — the better it matches your request, the more reliably it fires.

| Intent | Supported Phrases | |--------|-------------------| | 爬取今天 | "爬取今天的微信文章" / "获取今天的文章" / "抓今天的公众号" | | 爬取昨天 | "爬取昨天的微信文章" / "获取昨天的文章" | | 爬取指定日期 | "爬取1月20号的文章" / "获取上周一的文章" | | 仅列出 | "今天有哪些文章" / "列出今天的文章" / "看看有啥新文章" | | 增量爬取 | "继续爬取" / "爬取新增的文章" |

256 chars · catalog descriptionno explicit “when” triggerlonger than Claude Code's old 250-char listing cap (fine on current versions)

About this skill

WeChat Batch Crawl Skill

Overview

批量爬取微信公众号文章，支持智能反爬、日期过滤、自动去重。

Quick Start

# 爬取今天的文章
python resources/wechat_batch_scraper.py --date today

# 爬取指定日期
python resources/wechat_batch_scraper.py --date 2026-01-20

# 仅列出文章(不爬取)
python resources/wechat_batch_scraper.py --date today --list-only

Natural Language Patterns

Intent	Supported Phrases
爬取今天	"爬取今天的微信文章" / "获取今天的文章" / "抓今天的公众号"
爬取昨天	"爬取昨天的微信文章" / "获取昨天的文章"
爬取指定日期	"爬取1月20号的文章" / "获取上周一的文章"
仅列出	"今天有哪些文章" / "列出今天的文章" / "看看有啥新文章"
增量爬取	"继续爬取" / "爬取新增的文章"

Decision Boundaries

✅ Claude May Decide

Date parsing from natural language formats
Output directory naming and organization
Retry strategy within configured limits
Log verbosity and report format

❌ Claude Must NOT Change

RSS feed URL configuration
Anti-crawl delays (must stay 5-15 seconds)
Max workers (must not exceed 3)
Retry limits (max 3 retries)
Request method (must use curl via subprocess)

Core Implementation

Anti-Crawl: Use curl via subprocess

result = subprocess.run(
    ['curl', '-s', '-A',
     'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
     url],
    capture_output=True,
    text=True,
    timeout=30
)

Smart Delay: Adaptive by time

def get_adaptive_delay(self):
    hour = datetime.now().hour
    if 9 <= hour <= 18:    return random.uniform(10, 15)  # 白天高峰
    if 19 <= hour <= 23:   return random.uniform(7, 12)   # 晚间
    return random.uniform(3, 7)                            # 深夜

Deduplication: Filter before scraping

def filter_existing(self, urls, output_dir):
    scraped_urls = collect_scraped_urls(output_dir)
    return [url for url in urls if url not in scraped_urls]

Hooks

Pre-check (hooks/pre_check.py)

def check_dependencies():
    required = ['bs4', 'html2text', 'feedparser']
    missing = [p for p in required if not is_installed(p)]
    if missing:
        print(f"缺少依赖: pip install {' '.join(missing)}")
        return False
    return True

Post-summary (hooks/post_summary.py)

def generate_summary(results):
    success = sum(1 for r in results if r['success'])
    print(f"完成: {success}/{len(results)} ({success/len(results)*100:.1f}%)")

Output Structure

output_dir/
└── 2026-01-20/
    ├── 001_文章标题.md
    ├── 002_文章标题.md
    └── _metadata.json

Workflow Integration

wechat-batch-crawl → content-summarizer → knowledge-manager
     (爬取)              (总结亮点)          (更新知识库)

Troubleshooting

问题	解决方案
502 错误	确认使用 curl，非 requests
频繁失败	检查是否高峰期，增加延迟
重复爬取	检查 output_dir 路径

Install

mkdir -p .claude/skills/wechat-batch-crawl && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/14736" && unzip -o skill.zip -d .claude/skills/wechat-batch-crawl && rm skill.zip

Installs to .claude/skills/wechat-batch-crawl

Safety

Review before install

Runs shell / code
Network access

Automated static scan of the SKILL.md and repo. A flag describes what the skill can do — not a verdict. Always review code before installing.

Source & maintenance

Updated

5mo ago

License

MIT

Repo stars

7

Loads

~684 tokens

Stars are for the whole repository, not this skill alone.

Stats

Views

0

Installs

0

Author

JourneytoNewland

Links