skill-doctor

Reviews your Claude skills so they actually trigger, stay lean, and route cleanly.

A skill that audits other skills. It settles the things a machine can settle — character counts, broken routes, listing budget — with scripts and exit codes, and saves your attention for the calls that need judgment: dead instructions, bloat, exit conditions too vague to act on. Every defect it reports has a name and a line number.

See it run · Quick start · What it catches · How it works · Reference

See It Run

A real diagnosis, lightly trimmed:

$ "review my image-tool skill"

[skill-doctor] Auditing: ~/.claude/skills/image-tool  body=143 lines  description=52 chars
[skill-doctor] Budget: 32 skills installed, 11,515 chars vs ≈40,000 → fits
[skill-doctor] Dry-run prompt: "use image-tool to convert this"  broken links=1

[skill-doctor] Diagnosis

❌ Must fix (P0→P3)
  [P0 effect break]  Step 3 reads "the output file" but Step 2 never names one
                     → have Step 2 write a named artifact Step 3 can pick up
  [P1 structure]     description is first person ("I help you resize…")
                     → rewrite in third person ("Resizes and converts images…")
  [P2 specificity]   "process the image" has no command
                     → "convert to PNG with `sips -s format png`"

⚠️ Suggested
  - [no-op] "be careful and thorough" — the model already is → delete the line
  - [sprawl] body 143 lines; a 40-line example block → move to references/

✅ Passed
  - name regex, routing (0 orphans / 0 dangling), YAML, visible output per step

Here's the thing about a broken skill: nothing throws. It just doesn't fire, or Claude reads half of it and quietly moves on. You find out weeks later when the skill you wrote never once triggered. This is the pass that catches that before you ship.

Quick Start

Point it at any skill — in a Claude session, just ask:

review this skill   ·   审查 / 体检 / 修复 this skill   ·   "check my-tool"

Or run one deterministic check straight from the shell, no Claude needed:

python3 scripts/check_routes.py ~/.claude/skills/my-tool
# exit 0 = clean · exit 1 = orphans / dangling links / over the size cap

It reads the target, runs the mechanical checks, walks one real prompt through the workflow, and prints a P0→P3 report. It does not touch your files until you say "fix it".

What It Catches

The failures that don't announce themselves:

Symptom you'd never notice	What it flags
Skill never auto-fires	description in first person, too weak, or over the listing budget (silently dropped to name-only)
Claude skips half your instructions	body past ~500 lines — `[sprawl]`
A reference file never gets read	orphan or dangling route, caught by `check_routes.py`
A workflow step gets skipped	the step prints nothing, so the model walks past it
The agent quits a step early	exit condition too vague — `[premature-completion]`
A line that does nothing	the model already obeyed it by default — `[no-op]`

Every check, threshold, and rule is listed in full under Reference.

How It Works

  your skill  (SKILL.md + references/ + scripts/)
        |
        v
  mechanical checks ........ char / line counts · listing budget
  (scripts, exit-coded)      routing reachability · retrieval eval
        |
        v
  judgment .................. bloat · no-op · sediment · vague exits
  (dimensions, on demand)    loaded only when their condition fires
        |
        v
  dry-run ................... walk one real prompt through the workflow
        |
        v
  P0 -> P3 report .......... must-fix / suggested / passed, each with a line number
        |
        v
  apply fixes .............. only after you confirm

Two design choices make the verdict worth trusting:

Deterministic where it can be. Counts, routing, and budget are settled by scripts that return exit codes, not by an opinion that drifts between runs.
Honest where it can't. If skill-doctor already edited your skill earlier in the same session, it stamps every line of its report [self-review] and tells you to re-check with a fresh agent. It discounts its own grade on purpose.

Reference

Everything below is lookup material. You don't need to read it to use the skill.

Diagnosis flow

Step	Does	Output
1	Read the target SKILL.md; run `check_listing_budget.py` and `check_routes.py`	size + budget + routes verdicts
2	`index.py` rosters every dimension (FORCED / DEFAULT / SKIP, plus a `MODE=` dispatch line); read the fired ones one file at a time	the roster + one receipt per dimension
2.5	Dry-run: walk one prompt per task shape (write / revise / check) through the body	broken links + gates fired per prompt (any break → P0)
2.6	Live-injection: is the target's description actually injected this session, or dropped name-only	injected / dropped / skipped
3	Print the P0→P3 diagnosis (❌ must fix · ⚠️ suggested · ✅ passed)	the report
4	Apply fixes with Edit — only after you confirm	applied-fixes line

Bundled scripts

No LLM and no third-party dependencies (except eval_retrieval.py, which calls a model on request).

Script	Checks	Command	Exit
`index.py`	measures the target and prints the dimension roster (FORCED / DEFAULT / SKIP with reasons) plus the `MODE=` dispatch line	`python3 scripts/index.py <skill_dir>`	0 roster printed / 2 unreadable
`check_routes.py`	reachability, orphans, dangling links, the compass cap (6000 chars, overridable via a `compass-cap:` frontmatter field); `--before <snapshot>` adds a content-conservation check	`python3 scripts/check_routes.py <skill_dir>`	0 clean / 1 issues
`check_listing_budget.py`	total description chars vs the ~1%-of-context listing pool; lists every description over the 300-char band	`python3 scripts/check_listing_budget.py <root>`	0 fits / 1 overflow / 2 unavailable
`check_desc_slim.py`	flags descriptions that can be trimmed without losing trigger strength	`python3 scripts/check_desc_slim.py <skill_dir>`	0 clean / 1 issues
`eval_retrieval.py`	asks a model (GLM, 3× majority vote) whether the router can find each reference by its `when-to-read` scenario	`python3 scripts/eval_retrieval.py <skill_dir> --glm`	0 clean / 1 misses

Hard rules

A violation here is an error, not a suggestion.

Rule	Source
description ≤ 1024 chars (Claude Code ≥ 2.1.105 accepts 1536)	Anthropic spec / CC release notes
listing-budget overflow is always reported — it is a population fact, fixed by a global pass, never by editing one skill	CC issues #56710 / #47627
description in third person	Anthropic best practices
name uses only lowercase letters, digits, hyphens	Anthropic spec
body < 500 lines	Anthropic / community consensus
routing depth ≤ 2 hops (SKILL.md → index.md → file); nested indexes forbidden	structure-surgery (hardened from practice)
zero dangling routes, zero orphan subdocuments	structure-surgery iron rule
description on a single logical YAML line (no `>` or `	`)
every workflow step produces visible output	Seleznov experiment
a skill's claim about its own performance must point to in-repo evidence, or be cut	evidence-gating

Failure modes it names

A defect gets a name so you can act on it precisely; the P-tier says how bad, the name says what kind.

[no-op] — a line the model already obeys by default
[sediment] — a stale line that drifted out of relevance
[sprawl] — simply too long, even if every line is live
[duplication] — the same meaning in more than one place
[premature-completion] — an exit condition so vague the agent quits early
[weak-leading-word] — an anchor word too weak to beat the model's default
[no-floor] — no mandatory read-set, so audit depth varies run to run
[gate-gap] — quality gates keyed only to big tasks; the everyday small edit passes zero gates
[missing-license] — an editing skill that never grants rewrite authority, so the model's minimal-diff default wins

Priority tiers

Tier	Meaning	Enters ❌
P0	effect break (the flow doesn't connect when run)	always
P1	violates a structural hard rule	always
P2	insufficient specificity (no command, no I/O spec)	always
P3	readability	only when it changes what the next run does; else ⚠️

Dimensions (rostered on demand by `scripts/index.py`)

scripts/index.py decides which dimension files fire (FORCED / DEFAULT / SKIP); workflow docs load at the step that names them. references/index.md stays the human-readable mirror.

Reference	Covers
`description-templates.md`	trigger-strength template + listing-budget math
`body-quality-checklist.md`	body size and what to demote into references
`visible-output-rule.md`	every workflow step must print something
`yaml-pitfalls.md`	frontmatter formatting traps
`hard-code-vs-llm-judgment.md`	script it, or leave it to the model
`assets-vs-references.md`	sorting templates vs reference docs
`structure-surgery.md`	splitting, routing tiers, the 2-hop cap
`effect-dry-run.md`	walking one prompt per task shape through the workflow
`runtime-contract.md`	load floor, gate granularity, edit license: whether the skill still takes effect when it actually runs
`priority-tiers.md`	P0–P3 definitions and placement
`predictability-glossary.md`	the failure-mode vocabulary
`hard-rules.md`	the quantified must-pass rules
`exception-fallback.md`	what to do when a path or script errors
`language-policy.md`	English-by-default and its exceptions
`output-format.md`	the layered shape of the Step 3 report
`apply-safety.md`	size gate, pre-deletion check, closing the loop
`live-injection-check.md`	is the description actually injected this session
`rationale.md`	why the skill exists at all

Repository structure

skill-doctor/
├── SKILL.md                              # the compass: diagnosis flow + routing
├── references/
│   ├── index.md                          # routes to every dimension by when-to-read
│   ├── apply-safety.md
│   ├── assets-vs-references.md
│   ├── body-quality-checklist.md
│   ├── description-templates.md
│   ├── effect-dry-run.md
│   ├── exception-fallback.md
│   ├── hard-code-vs-llm-judgment.md
│   ├── hard-rules.md
│   ├── language-policy.md
│   ├── live-injection-check.md
│   ├── output-format.md
│   ├── predictability-glossary.md
│   ├── priority-tiers.md
│   ├── rationale.md
│   ├── runtime-contract.md
│   ├── structure-surgery.md
│   ├── visible-output-rule.md
│   └── yaml-pitfalls.md
├── scripts/
│   ├── index.py                          # dimension roster: FORCED / DEFAULT / SKIP + MODE=
│   ├── check_routes.py                   # reachability / orphans / dangling / compass cap
│   ├── check_listing_budget.py           # description listing-budget
│   ├── check_desc_slim.py                # description slimming
│   └── eval_retrieval.py                 # retrieval eval (model-backed)
├── README.md
├── README.zh-CN.md
└── LICENSE

Skills fail silently. This is the pass that makes them fail loudly first.

Author: Zane456 · License: MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

skill-doctor

See It Run

Quick Start

What It Catches

How It Works

Reference

Diagnosis flow

Bundled scripts

Hard rules

Failure modes it names

Priority tiers

Dimensions (rostered on demand by `scripts/index.py`)

Repository structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
assets		assets
references		references
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
SKILL.md		SKILL.md

Folders and files

Latest commit

History

Repository files navigation

skill-doctor

See It Run

Quick Start

What It Catches

How It Works

Reference

Diagnosis flow

Bundled scripts

Hard rules

Failure modes it names

Priority tiers

Dimensions (rostered on demand by scripts/index.py)

Repository structure

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Dimensions (rostered on demand by `scripts/index.py`)

Packages