Skip to content

Othmane-Khadri/agent-rl-skill

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Agent RL — Reinforcement Learning for Claude Code

Make your AI agents learn from your feedback. Every correction becomes a permanent rule. Over weeks, the corrections file becomes a methodology.

What It Does

This skill adds a persistent feedback loop to any Claude Code skill, prompt file, or conversational workflow:

  1. Bootstrap — Point it at any skill or prompt. It creates memory files, injects a feedback step, and shows you how to give your first 5 rounds of feedback.
  2. Feedback — After every run, rate the output and give corrections. Generalizable corrections get saved permanently. One-off fixes stay one-off.
  3. Compile — Review accumulated rules. Deduplicate, resolve contradictions, remove stale entries.
  4. Stats — See how your agent is improving. Rule growth, grade trends, category breakdown.
  5. Remove — Cleanly remove RL from a skill. Your correction data is preserved.

How It Works

You correct an agent's output
        │
        ▼
Classify: one-off or generalizable?
        │
        ├─ One-off → fix this output, move on
        │
        └─ Generalizable → save to corrections.md
                                    │
                                    ▼
                        Next run reads corrections.md FIRST
                        ─── rules override defaults ───
                                    │
                                    ▼
                        Output improves over time

Works With

  • Claude Code skills — any SKILL.md in .claude/skills/
  • Prompt files — any .md or .txt used as an agent prompt
  • Inline workflows — conversational patterns with no file (creates one for you)

Feedback Channels

Channel How it works Best for
In Claude Code Rate + correct inline after each run Solo users
Feedback log Entries appended to a markdown file Async review
Notion DB Each output gets a database row Teams

Feedback Depth

Depth What gets captured Time
Grades only 1-5 score tracked in grades-log.md ~5 sec
Corrections Wrong/Right/Pattern format ~1-2 min
Full context Corrections + voice samples + anti-patterns + vocabulary ~3-5 min

Install

Option 1: Copy manually

cp -r .claude/skills/agent-rl ~/.claude/skills/agent-rl

Option 2: Use the install script

git clone https://github.com/Othmane-Khadri/agent-rl-skill.git
cd agent-rl-skill
chmod +x install.sh
./install.sh

Usage

Add RL to a skill

"add RL to my linkedin-post skill"

Give feedback (automatic after each run)

"How did this land? (1-5)"  →  you: "3"
"Any corrections?"          →  you: "Too formal, drop the hedging"
"Save as permanent rule?"   →  you: "yes"

Review accumulated rules

"review my RL rules for linkedin-post"

Check progress

"RL stats"

Remove RL

"remove RL from linkedin-post"

What Gets Created

When you add RL to a skill:

.claude/skills/your-skill/
├── SKILL.md              ← modified (2 steps added, backup saved)
├── rl-config.md          ← configuration and stats
├── corrections.md        ← the memory (grows over time)
├── grades-log.md         ← grade history (if grades-only depth)
├── voice-samples.md      ← what good looks like (if full-context)
├── anti-patterns.md      ← what to avoid (if full-context)
├── vocabulary.md         ← terms to use/avoid (if full-context)
└── *.pre-rl.bak          ← backup of original SKILL.md

Agent RL's own self-improving files:

.claude/skills/agent-rl/
├── SKILL.md              ← the skill itself
├── corrections.md        ← corrections on how agent-rl works (self-RL)
└── self-rl-log.md        ← usage and grade history

Your First 5 Sessions

Session Focus What to save
1 The obvious The correction you'd make every time
2 The tone How the output sounds (too formal? too casual?)
3 The structure How it's organized (wrong order? missing section?)
4 The vocabulary Specific words (jargon? missing terms?)
5 First compile Run "review my RL rules" and see your methodology forming

Prerequisites

  • Claude Code installed
  • At least one existing skill, prompt file, or workflow to add RL to

Self-Improving

Agent RL eats its own dog food. It ships with its own corrections.md and self-rl-log.md. After every mode you run, it asks for quick feedback on how it performed — and saves generalizable corrections to improve itself next time.

Your corrections make the skill better for you. Over time, agent-rl learns how you like to bootstrap, capture feedback, and compile rules.

Design Principles

  1. Corrections override defaults — the memory file outranks the base prompt
  2. Library, not checklist — apply with judgment, not mechanically
  3. Reasoning included — every rule says why, not just what
  4. One-off vs. generalizable — prevents rule bloat
  5. Confirm before saving — no silent writes
  6. Backups before edits — every modification creates a .bak file
  7. Data preserved on removal — removing RL keeps your corrections
  8. Eats its own dog food — the skill applies RL to itself

Built by Earleads — GTM Engineering as a Service.

About

A Claude Code skill that adds reinforcement learning to any AI agent. Make your agents learn from human feedback.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages