English | 简体中文
Reviews your Claude skills so they actually trigger, stay lean, and route cleanly.
A skill that audits other skills. It settles the things a machine can settle — character counts, broken routes, listing budget — with scripts and exit codes, and saves your attention for the calls that need judgment: dead instructions, bloat, exit conditions too vague to act on. Every defect it reports has a name and a line number.
See it run · Quick start · What it catches · How it works · Reference
A real diagnosis, lightly trimmed:
$ "review my image-tool skill"
[skill-doctor] Auditing: ~/.claude/skills/image-tool body=143 lines description=52 chars
[skill-doctor] Budget: 32 skills installed, 11,515 chars vs ≈40,000 → fits
[skill-doctor] Dry-run prompt: "use image-tool to convert this" broken links=1
[skill-doctor] Diagnosis
❌ Must fix (P0→P3)
[P0 effect break] Step 3 reads "the output file" but Step 2 never names one
→ have Step 2 write a named artifact Step 3 can pick up
[P1 structure] description is first person ("I help you resize…")
→ rewrite in third person ("Resizes and converts images…")
[P2 specificity] "process the image" has no command
→ "convert to PNG with `sips -s format png`"
⚠️ Suggested
- [no-op] "be careful and thorough" — the model already is → delete the line
- [sprawl] body 143 lines; a 40-line example block → move to references/
✅ Passed
- name regex, routing (0 orphans / 0 dangling), YAML, visible output per step
Here's the thing about a broken skill: nothing throws. It just doesn't fire, or Claude reads half of it and quietly moves on. You find out weeks later when the skill you wrote never once triggered. This is the pass that catches that before you ship.
Point it at any skill — in a Claude session, just ask:
review this skill · 审查 / 体检 / 修复 this skill · "check my-tool"
Or run one deterministic check straight from the shell, no Claude needed:
python3 scripts/check_routes.py ~/.claude/skills/my-tool
# exit 0 = clean · exit 1 = orphans / dangling links / over the size capIt reads the target, runs the mechanical checks, walks one real prompt through the workflow, and prints a P0→P3 report. It does not touch your files until you say "fix it".
The failures that don't announce themselves:
| Symptom you'd never notice | What it flags |
|---|---|
| Skill never auto-fires | description in first person, too weak, or over the listing budget (silently dropped to name-only) |
| Claude skips half your instructions | body past ~500 lines — [sprawl] |
| A reference file never gets read | orphan or dangling route, caught by check_routes.py |
| A workflow step gets skipped | the step prints nothing, so the model walks past it |
| The agent quits a step early | exit condition too vague — [premature-completion] |
| A line that does nothing | the model already obeyed it by default — [no-op] |
Every check, threshold, and rule is listed in full under Reference.
your skill (SKILL.md + references/ + scripts/)
|
v
mechanical checks ........ char / line counts · listing budget
(scripts, exit-coded) routing reachability · retrieval eval
|
v
judgment .................. bloat · no-op · sediment · vague exits
(dimensions, on demand) loaded only when their condition fires
|
v
dry-run ................... walk one real prompt through the workflow
|
v
P0 -> P3 report .......... must-fix / suggested / passed, each with a line number
|
v
apply fixes .............. only after you confirm
Two design choices make the verdict worth trusting:
- Deterministic where it can be. Counts, routing, and budget are settled by scripts that return exit codes, not by an opinion that drifts between runs.
- Honest where it can't. If skill-doctor already edited your skill earlier in the same session, it stamps every line of its report
[self-review]and tells you to re-check with a fresh agent. It discounts its own grade on purpose.
Everything below is lookup material. You don't need to read it to use the skill.
| Step | Does | Output |
|---|---|---|
| 1 | Read the target SKILL.md; run check_listing_budget.py and check_routes.py |
size + budget + routes verdicts |
| 2 | index.py rosters every dimension (FORCED / DEFAULT / SKIP, plus a MODE= dispatch line); read the fired ones one file at a time |
the roster + one receipt per dimension |
| 2.5 | Dry-run: walk one prompt per task shape (write / revise / check) through the body | broken links + gates fired per prompt (any break → P0) |
| 2.6 | Live-injection: is the target's description actually injected this session, or dropped name-only | injected / dropped / skipped |
| 3 | Print the P0→P3 diagnosis (❌ must fix · |
the report |
| 4 | Apply fixes with Edit — only after you confirm | applied-fixes line |
No LLM and no third-party dependencies (except eval_retrieval.py, which calls a model on request).
| Script | Checks | Command | Exit |
|---|---|---|---|
index.py |
measures the target and prints the dimension roster (FORCED / DEFAULT / SKIP with reasons) plus the MODE= dispatch line |
python3 scripts/index.py <skill_dir> |
0 roster printed / 2 unreadable |
check_routes.py |
reachability, orphans, dangling links, the compass cap (6000 chars, overridable via a compass-cap: frontmatter field); --before <snapshot> adds a content-conservation check |
python3 scripts/check_routes.py <skill_dir> |
0 clean / 1 issues |
check_listing_budget.py |
total description chars vs the ~1%-of-context listing pool; lists every description over the 300-char band | python3 scripts/check_listing_budget.py <root> |
0 fits / 1 overflow / 2 unavailable |
check_desc_slim.py |
flags descriptions that can be trimmed without losing trigger strength | python3 scripts/check_desc_slim.py <skill_dir> |
0 clean / 1 issues |
eval_retrieval.py |
asks a model (GLM, 3× majority vote) whether the router can find each reference by its when-to-read scenario |
python3 scripts/eval_retrieval.py <skill_dir> --glm |
0 clean / 1 misses |
A violation here is an error, not a suggestion.
| Rule | Source |
|---|---|
| description ≤ 1024 chars (Claude Code ≥ 2.1.105 accepts 1536) | Anthropic spec / CC release notes |
| listing-budget overflow is always reported — it is a population fact, fixed by a global pass, never by editing one skill | CC issues #56710 / #47627 |
| description in third person | Anthropic best practices |
| name uses only lowercase letters, digits, hyphens | Anthropic spec |
| body < 500 lines | Anthropic / community consensus |
| routing depth ≤ 2 hops (SKILL.md → index.md → file); nested indexes forbidden | structure-surgery (hardened from practice) |
| zero dangling routes, zero orphan subdocuments | structure-surgery iron rule |
description on a single logical YAML line (no > or ` |
`) |
| every workflow step produces visible output | Seleznov experiment |
| a skill's claim about its own performance must point to in-repo evidence, or be cut | evidence-gating |
A defect gets a name so you can act on it precisely; the P-tier says how bad, the name says what kind.
[no-op]— a line the model already obeys by default[sediment]— a stale line that drifted out of relevance[sprawl]— simply too long, even if every line is live[duplication]— the same meaning in more than one place[premature-completion]— an exit condition so vague the agent quits early[weak-leading-word]— an anchor word too weak to beat the model's default[no-floor]— no mandatory read-set, so audit depth varies run to run[gate-gap]— quality gates keyed only to big tasks; the everyday small edit passes zero gates[missing-license]— an editing skill that never grants rewrite authority, so the model's minimal-diff default wins
| Tier | Meaning | Enters ❌ |
|---|---|---|
| P0 | effect break (the flow doesn't connect when run) | always |
| P1 | violates a structural hard rule | always |
| P2 | insufficient specificity (no command, no I/O spec) | always |
| P3 | readability | only when it changes what the next run does; else |
scripts/index.py decides which dimension files fire (FORCED / DEFAULT / SKIP); workflow docs load at the step that names them. references/index.md stays the human-readable mirror.
| Reference | Covers |
|---|---|
description-templates.md |
trigger-strength template + listing-budget math |
body-quality-checklist.md |
body size and what to demote into references |
visible-output-rule.md |
every workflow step must print something |
yaml-pitfalls.md |
frontmatter formatting traps |
hard-code-vs-llm-judgment.md |
script it, or leave it to the model |
assets-vs-references.md |
sorting templates vs reference docs |
structure-surgery.md |
splitting, routing tiers, the 2-hop cap |
effect-dry-run.md |
walking one prompt per task shape through the workflow |
runtime-contract.md |
load floor, gate granularity, edit license: whether the skill still takes effect when it actually runs |
priority-tiers.md |
P0–P3 definitions and placement |
predictability-glossary.md |
the failure-mode vocabulary |
hard-rules.md |
the quantified must-pass rules |
exception-fallback.md |
what to do when a path or script errors |
language-policy.md |
English-by-default and its exceptions |
output-format.md |
the layered shape of the Step 3 report |
apply-safety.md |
size gate, pre-deletion check, closing the loop |
live-injection-check.md |
is the description actually injected this session |
rationale.md |
why the skill exists at all |
skill-doctor/
├── SKILL.md # the compass: diagnosis flow + routing
├── references/
│ ├── index.md # routes to every dimension by when-to-read
│ ├── apply-safety.md
│ ├── assets-vs-references.md
│ ├── body-quality-checklist.md
│ ├── description-templates.md
│ ├── effect-dry-run.md
│ ├── exception-fallback.md
│ ├── hard-code-vs-llm-judgment.md
│ ├── hard-rules.md
│ ├── language-policy.md
│ ├── live-injection-check.md
│ ├── output-format.md
│ ├── predictability-glossary.md
│ ├── priority-tiers.md
│ ├── rationale.md
│ ├── runtime-contract.md
│ ├── structure-surgery.md
│ ├── visible-output-rule.md
│ └── yaml-pitfalls.md
├── scripts/
│ ├── index.py # dimension roster: FORCED / DEFAULT / SKIP + MODE=
│ ├── check_routes.py # reachability / orphans / dangling / compass cap
│ ├── check_listing_budget.py # description listing-budget
│ ├── check_desc_slim.py # description slimming
│ └── eval_retrieval.py # retrieval eval (model-backed)
├── README.md
├── README.zh-CN.md
└── LICENSE