agentskills.codes
LL

llm-cost-report

Weekly LLM spend + unit-economics report — cost by stage (Haiku classifier vs Sonnet reply), by product (Solo 1:1 vs Family group), and per household, plotted against the price. Surfaces margin creep before it becomes a loss. Use to report spend, check the margin, or size the cost impact of a featur

Install

mkdir -p .claude/skills/llm-cost-report && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/16472" && unzip -o skill.zip -d .claude/skills/llm-cost-report && rm skill.zip

Installs to .claude/skills/llm-cost-report

Activation

This is the description your AI agent reads to decide when to run this skill — the better it matches your request, the more reliably it fires.

Weekly LLM spend + unit-economics report — cost by stage (Haiku classifier vs Sonnet reply), by product (Solo 1:1 vs Family group), and per household, plotted against the price. Surfaces margin creep before it becomes a loss. Use to report spend, check the margin, or size the cost impact of a feature. Triggers - cost report, spend, margin, unit economics, how much per household.
381 charsno explicit “when” triggerlonger than Claude Code's old 250-char listing cap (fine on current versions)

About this skill

Turns LLM spend from a surprise into a tracked weekly number. Source of truth: docs/plans/2026-05-09-recovery-hardening-plan.md. Owned by Koren; pairs with token-optimization and Paz's first-try-scoreboard.

Data sources

  • ai_usage table (per-call token + cost rows) via mcp__supabase__execute_sql.
  • PostHog LLM-analytics MCP: get-llm-total-costs-for-project, exploring-llm-costs.

The weekly cuts to produce

  1. By stage: Haiku classifier (claude-haiku-4-5-20251001) vs Sonnet reply (claude-sonnet-4-20250514) — input/cached/uncached token split each.
  2. By product: Solo 1:1 vs Family group (normalize chat via split_part(group_id,'@',1); @g.us = Family, bare/@s.whatsapp.net = Solo).
  3. Per household: total monthly LLM cost ÷ active households.
  4. Margin line: per-household cost vs Premium ₪14.90/mo (~$4) and ₪149/yr.

Baselines to compare against

  • ~$0.50/household/month two-stage vs ~$1.62 all-Sonnet.
  • 1:1 ~$27.50 / 1K actions (64% silent-Sonnet) vs group ~$16.90 / 1K.

Margin-creep watchlist

Flag when these grow per-call tokens:

  • Google-calendar context injection (buildCalendarContextBlock, ≤200 tokens but per qualifying turn).
  • 1:1 conversation-history fetch into the E5 extractor / Sonnet.
  • SHARED_* / prompt growth.
  • Cloud API Meta Utility template fees (~$0.005–0.015/conversation) once proactive sends move off the ambient-chatter subsidy (master-plan H0).

Output + the unit metric

Produce a one-page weekly summary; the headline number Koren reports up is the margin (price − per-household cost).

Combine with first-try-scoreboard to report cost per correctly-resolved-first-try message — the real unit. A cheaper bot that resolves less is not cheaper.

Running the report

-- Weekly spend by model and product type
SELECT
  model,
  CASE
    WHEN split_part(group_id, '@', 2) = 'g.us' THEN 'Family'
    ELSE 'Solo'
  END AS product,
  SUM(input_tokens)    AS input_tokens,
  SUM(cached_tokens)   AS cached_tokens,
  SUM(output_tokens)   AS output_tokens,
  SUM(cost_usd)        AS cost_usd
FROM ai_usage
WHERE created_at >= now() - interval '7 days'
GROUP BY 1, 2
ORDER BY cost_usd DESC;

-- Per-household monthly cost
SELECT
  household_id,
  SUM(cost_usd) AS monthly_cost_usd
FROM ai_usage
WHERE created_at >= date_trunc('month', now())
GROUP BY household_id
ORDER BY monthly_cost_usd DESC;

When to run this

When a feature PR lands that touches prompt size or adds an LLM call, run this to size the per-household delta BEFORE it ships at scale.

Ground-truth tools (the DB has NO tokens/$ — use these instead)

ai_usage only stores household_id / usage_date / message_count, so $ must come from Anthropic, not Supabase. Two scripts cover it:

  • scripts/analyze_token_csvs.py — the workhorse. Export the per-day token CSV from the Console (Usage → download) to ~/Downloads/claude_api_tokens_YYYY_MM.csv, then py -3.13 scripts/analyze_token_csvs.py. Reconciles to the dashboard total and breaks spend down by month × model, month × api_key, and cache read:write savings — how you tell a model swap from a caching effect from token-bloat.
  • scripts/anthropic_usage_report.py — same breakdown live via the Admin Usage/Cost API, IF you have an sk-ant-admin... key. NOTE: individual-tier orgs can't create admin keys (the API-keys page only issues workspace sk-ant-api... keys) — fall back to the CSV script.

Validated baseline (2026-06-08; bot = api_key my-new-key ≈ 100% of spend): the Sonnet 4.6 swap was cost-neutral ($3/$15, same as Sonnet 4); prompt caching saves ~$40/mo (read:write ~1.3×, net-positive); the per-message cost is driven by dynamic per-call context bloat (E5-shadow Haiku + calendar/history injection) and falls as volume grows (cache amortization). Watch cost ÷ first-try-resolved count.

Search skills

Search the agent skills registry