llm-cost-report
Weekly LLM spend + unit-economics report — cost by stage (Haiku classifier vs Sonnet reply), by product (Solo 1:1 vs Family group), and per household, plotted against the price. Surfaces margin creep before it becomes a loss. Use to report spend, check the margin, or size the cost impact of a featur
Install
mkdir -p .claude/skills/llm-cost-report && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/16472" && unzip -o skill.zip -d .claude/skills/llm-cost-report && rm skill.zipInstalls to .claude/skills/llm-cost-report
Activation
This is the description your AI agent reads to decide when to run this skill — the better it matches your request, the more reliably it fires.
Weekly LLM spend + unit-economics report — cost by stage (Haiku classifier vs Sonnet reply), by product (Solo 1:1 vs Family group), and per household, plotted against the price. Surfaces margin creep before it becomes a loss. Use to report spend, check the margin, or size the cost impact of a feature. Triggers - cost report, spend, margin, unit economics, how much per household.About this skill
Turns LLM spend from a surprise into a tracked weekly number. Source of truth: docs/plans/2026-05-09-recovery-hardening-plan.md. Owned by Koren; pairs with token-optimization and Paz's first-try-scoreboard.
Data sources
ai_usagetable (per-call token + cost rows) viamcp__supabase__execute_sql.- PostHog LLM-analytics MCP:
get-llm-total-costs-for-project,exploring-llm-costs.
The weekly cuts to produce
- By stage: Haiku classifier (
claude-haiku-4-5-20251001) vs Sonnet reply (claude-sonnet-4-20250514) — input/cached/uncached token split each. - By product: Solo 1:1 vs Family group (normalize chat via
split_part(group_id,'@',1);@g.us= Family, bare/@s.whatsapp.net= Solo). - Per household: total monthly LLM cost ÷ active households.
- Margin line: per-household cost vs Premium ₪14.90/mo (~$4) and ₪149/yr.
Baselines to compare against
- ~$0.50/household/month two-stage vs ~$1.62 all-Sonnet.
- 1:1 ~$27.50 / 1K actions (64% silent-Sonnet) vs group ~$16.90 / 1K.
Margin-creep watchlist
Flag when these grow per-call tokens:
- Google-calendar context injection (
buildCalendarContextBlock, ≤200 tokens but per qualifying turn). - 1:1 conversation-history fetch into the E5 extractor / Sonnet.
- SHARED_* / prompt growth.
- Cloud API Meta Utility template fees (~$0.005–0.015/conversation) once proactive sends move off the ambient-chatter subsidy (master-plan H0).
Output + the unit metric
Produce a one-page weekly summary; the headline number Koren reports up is the margin (price − per-household cost).
Combine with first-try-scoreboard to report cost per correctly-resolved-first-try message — the real unit. A cheaper bot that resolves less is not cheaper.
Running the report
-- Weekly spend by model and product type
SELECT
model,
CASE
WHEN split_part(group_id, '@', 2) = 'g.us' THEN 'Family'
ELSE 'Solo'
END AS product,
SUM(input_tokens) AS input_tokens,
SUM(cached_tokens) AS cached_tokens,
SUM(output_tokens) AS output_tokens,
SUM(cost_usd) AS cost_usd
FROM ai_usage
WHERE created_at >= now() - interval '7 days'
GROUP BY 1, 2
ORDER BY cost_usd DESC;
-- Per-household monthly cost
SELECT
household_id,
SUM(cost_usd) AS monthly_cost_usd
FROM ai_usage
WHERE created_at >= date_trunc('month', now())
GROUP BY household_id
ORDER BY monthly_cost_usd DESC;
When to run this
When a feature PR lands that touches prompt size or adds an LLM call, run this to size the per-household delta BEFORE it ships at scale.
Ground-truth tools (the DB has NO tokens/$ — use these instead)
ai_usage only stores household_id / usage_date / message_count, so $ must come from Anthropic, not Supabase. Two scripts cover it:
scripts/analyze_token_csvs.py— the workhorse. Export the per-day token CSV from the Console (Usage → download) to~/Downloads/claude_api_tokens_YYYY_MM.csv, thenpy -3.13 scripts/analyze_token_csvs.py. Reconciles to the dashboard total and breaks spend down by month × model, month × api_key, and cache read:write savings — how you tell a model swap from a caching effect from token-bloat.scripts/anthropic_usage_report.py— same breakdown live via the Admin Usage/Cost API, IF you have ansk-ant-admin...key. NOTE: individual-tier orgs can't create admin keys (the API-keys page only issues workspacesk-ant-api...keys) — fall back to the CSV script.
Validated baseline (2026-06-08; bot = api_key my-new-key ≈ 100% of spend): the Sonnet 4.6 swap was cost-neutral ($3/$15, same as Sonnet 4); prompt caching saves ~$40/mo (read:write ~1.3×, net-positive); the per-message cost is driven by dynamic per-call context bloat (E5-shadow Haiku + calendar/history injection) and falls as volume grows (cache amortization). Watch cost ÷ first-try-resolved count.