Make your AI agents learn from your feedback. Every correction becomes a permanent rule. Over weeks, the corrections file becomes a methodology.
This skill adds a persistent feedback loop to any Claude Code skill, prompt file, or conversational workflow:
- Bootstrap — Point it at any skill or prompt. It creates memory files, injects a feedback step, and shows you how to give your first 5 rounds of feedback.
- Feedback — After every run, rate the output and give corrections. Generalizable corrections get saved permanently. One-off fixes stay one-off.
- Compile — Review accumulated rules. Deduplicate, resolve contradictions, remove stale entries.
- Stats — See how your agent is improving. Rule growth, grade trends, category breakdown.
- Remove — Cleanly remove RL from a skill. Your correction data is preserved.
You correct an agent's output
│
▼
Classify: one-off or generalizable?
│
├─ One-off → fix this output, move on
│
└─ Generalizable → save to corrections.md
│
▼
Next run reads corrections.md FIRST
─── rules override defaults ───
│
▼
Output improves over time
- Claude Code skills — any SKILL.md in
.claude/skills/ - Prompt files — any
.mdor.txtused as an agent prompt - Inline workflows — conversational patterns with no file (creates one for you)
| Channel | How it works | Best for |
|---|---|---|
| In Claude Code | Rate + correct inline after each run | Solo users |
| Feedback log | Entries appended to a markdown file | Async review |
| Notion DB | Each output gets a database row | Teams |
| Depth | What gets captured | Time |
|---|---|---|
| Grades only | 1-5 score tracked in grades-log.md | ~5 sec |
| Corrections | Wrong/Right/Pattern format | ~1-2 min |
| Full context | Corrections + voice samples + anti-patterns + vocabulary | ~3-5 min |
cp -r .claude/skills/agent-rl ~/.claude/skills/agent-rlgit clone https://github.com/Othmane-Khadri/agent-rl-skill.git
cd agent-rl-skill
chmod +x install.sh
./install.sh"add RL to my linkedin-post skill"
"How did this land? (1-5)" → you: "3"
"Any corrections?" → you: "Too formal, drop the hedging"
"Save as permanent rule?" → you: "yes"
"review my RL rules for linkedin-post"
"RL stats"
"remove RL from linkedin-post"
When you add RL to a skill:
.claude/skills/your-skill/
├── SKILL.md ← modified (2 steps added, backup saved)
├── rl-config.md ← configuration and stats
├── corrections.md ← the memory (grows over time)
├── grades-log.md ← grade history (if grades-only depth)
├── voice-samples.md ← what good looks like (if full-context)
├── anti-patterns.md ← what to avoid (if full-context)
├── vocabulary.md ← terms to use/avoid (if full-context)
└── *.pre-rl.bak ← backup of original SKILL.md
Agent RL's own self-improving files:
.claude/skills/agent-rl/
├── SKILL.md ← the skill itself
├── corrections.md ← corrections on how agent-rl works (self-RL)
└── self-rl-log.md ← usage and grade history
| Session | Focus | What to save |
|---|---|---|
| 1 | The obvious | The correction you'd make every time |
| 2 | The tone | How the output sounds (too formal? too casual?) |
| 3 | The structure | How it's organized (wrong order? missing section?) |
| 4 | The vocabulary | Specific words (jargon? missing terms?) |
| 5 | First compile | Run "review my RL rules" and see your methodology forming |
- Claude Code installed
- At least one existing skill, prompt file, or workflow to add RL to
Agent RL eats its own dog food. It ships with its own corrections.md and self-rl-log.md. After every mode you run, it asks for quick feedback on how it performed — and saves generalizable corrections to improve itself next time.
Your corrections make the skill better for you. Over time, agent-rl learns how you like to bootstrap, capture feedback, and compile rules.
- Corrections override defaults — the memory file outranks the base prompt
- Library, not checklist — apply with judgment, not mechanically
- Reasoning included — every rule says why, not just what
- One-off vs. generalizable — prevents rule bloat
- Confirm before saving — no silent writes
- Backups before edits — every modification creates a .bak file
- Data preserved on removal — removing RL keeps your corrections
- Eats its own dog food — the skill applies RL to itself
Built by Earleads — GTM Engineering as a Service.