fix-annotation-from-issues

Name: fix-annotation-from-issues
Author: cboulanger

Use when a GitHub issue reports a missed, spurious, or wrong-element annotation. Reproduces the failure with debug_annotation.py, diagnoses the root cause, and improves schema descriptions or rules.

Install

mkdir -p .claude/skills/fix-annotation-from-issues && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/14630" && unzip -o skill.zip -d .claude/skills/fix-annotation-from-issues && rm skill.zip

Installs to .claude/skills/fix-annotation-from-issues

Activation

This is the description your AI agent reads to decide when to run this skill — the better it matches your request, the more reliably it fires.

Use when a GitHub issue reports a missed, spurious, or wrong-element annotation. Reproduces the failure with debug_annotation.py, diagnoses the root cause, and improves schema descriptions or rules.

198 chars✓ has a “when” trigger

About this skill

fix-annotation-from-issues

Fix annotation failures reported in GitHub issues by reproducing them with debug_annotation.py, diagnosing the root cause, and improving TEIElement descriptions or TEISchema.rules in the relevant schema file.

This skill is complementary to optimize-element-descriptions, which improves descriptions by maximising F1 against a gold standard. Use this skill when the failure is known from a user-reported issue rather than from an evaluation run. The description-writing guidelines and schema file locations are the same — see docs/tei-element-descriptions.md and optimize-element-descriptions.

Extra arguments (e.g. --provider kisski) are forwarded to debug_annotation.py where applicable.

Step 1 — Fetch the issue

Pull the issue body and all comments so you have every failing snippet:

gh issue view $ISSUE --comments

Issues may contain:

XML source text — a <bibl> block as it appears in the gold file; this is the input to the annotator.
XML labelled text — the annotator's actual output; this is what went wrong.
Screenshots — diff images embedded as <img> tags. Screenshots cannot be read programmatically, but the surrounding text usually describes the error. Note their position in the issue for context.

Extract every <bibl> block from the issue that demonstrates a failure. Strip the outer <bibl>…</bibl> wrapper to get the raw text the annotator sees.

Step 2 — Reproduce with `debug_annotation.py`

Run the failing snippet through the full pipeline:

uv run scripts/debug_annotation.py \
    --text "<raw text from issue>" \
    --show-prompt \
    [--provider gemini] [--schema bibl-reference-segmenter]

The script prints every pipeline stage. Read top-to-bottom and identify the first stage where the output diverges from what is expected:

Stage	What to look for	Likely fix
Parsed spans	Wrong element, wrong text, or missing span	Improve element description or schema rules
Resolved spans	Span parsed correctly but not resolved	Prompt or context instructions
Validated spans	Resolved but rejected	Element name or attribute value list wrong
Final XML	All spans correct but nesting wrong	`inject_xml` / injector issue

Only improve schema descriptions or rules for Parsed spans failures.

Step 3 — Diagnose the failure pattern

Compare the issue's "source text" against its "labelled text" (or the screenshot description) and the debug output. Classify the failure:

Pattern	Typical cause
Span emitted as wrong element	Missing negative constraint in description
Required span entirely missing	Trigger condition or surface-form example absent
Span boundary too wide or too narrow	Boundary rule not stated
Multiple references merged into one span	No "one span per …" instruction
Parent span missing around child span	Parent–child relationship not described from both sides
Span emitted for only part of the reference	Instruction to cover the full reference absent

Check whether the same pattern appears in multiple comments or multiple issues. A cross-issue pattern belongs in TEISchema.rules, not in a single element description.

Step 4 — Improve descriptions

Read the current schema file, then edit it following the guidelines in docs/tei-element-descriptions.md.

Key principles (summary — see the full guidelines for detail):

Phrase everything as "emit a span", not "wrap in a tag".
State multiplicity explicitly: "a separate span for each distinct …"
Describe parent–child relationships from both sides with a concrete example.
Add negative constraints: "never tag X as Y".
Include textual triggers (keywords, position) and inline surface-form examples.
Prefix critical constraints with CRITICAL:.
Cross-element patterns → TEISchema.rules (rendered before all element descriptions); single-element patterns → the element's description.

Only edit descriptions for elements directly implicated by the failure.

Step 5 — Re-run the debug script

Re-run debug_annotation.py on every failing snippet extracted in Step 1:

uv run scripts/debug_annotation.py \
    --text "<same raw text>" \
    [--provider gemini] [--schema bibl-reference-segmenter]

Confirm that the parsed spans now match the expected annotation. If the fix introduced a regression on a different snippet, diagnose and resolve before continuing.

Step 6 — Check for regressions on the gold standard

Run a targeted evaluation over records whose text overlaps with the fixed snippet to catch regressions:

uv run scripts/evaluate_llm.py --verbose --match-mode overlap \
    --grep "keyword_from_fixed_text" [--provider gemini]

If overall F1 is unchanged or improved, the fix is safe.

Step 7 — Close or comment on the issue

Once the fix is confirmed:

# Add a resolution comment
gh issue comment $ISSUE --body "Fixed in <commit hash>: <one-line description of the change>"

# Close the issue if fully resolved
gh issue close $ISSUE

If only some examples in the issue are fixed, leave the remaining failing snippets as a comment and keep the issue open.

When to stop

Stop and flag for human review if:

The failure persists after two rounds of description changes and the issue appears across multiple model families — this may be a fundamental model reasoning limit rather than a prompt-quality problem.
The expected annotation is itself ambiguous (either split or merged would be defensible). In that case, consider adding a cert="low" span to the gold file as described in optimize-element-descriptions Step 5a.

Install

mkdir -p .claude/skills/fix-annotation-from-issues && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/14630" && unzip -o skill.zip -d .claude/skills/fix-annotation-from-issues && rm skill.zip

Installs to .claude/skills/fix-annotation-from-issues

Safety

Review before install

Runs shell / code

Automated static scan of the SKILL.md and repo. A flag describes what the skill can do — not a verdict. Always review code before installing.

Source & maintenance

Updated

1mo ago

License

MIT

Repo stars

Loads

~1,481 tokens

Stars are for the whole repository, not this skill alone.

Stats

Views

Installs

Author

cboulanger

Links

Source code

fix-annotation-from-issues

Install

Activation

About this skill

fix-annotation-from-issues

Step 1 — Fetch the issue

Step 2 — Reproduce with debug_annotation.py

Step 3 — Diagnose the failure pattern

Step 4 — Improve descriptions

Step 5 — Re-run the debug script

Step 6 — Check for regressions on the gold standard

Step 7 — Close or comment on the issue

When to stop

Search skills

Step 2 — Reproduce with `debug_annotation.py`