RLHFlow

All

12 repositories

Reinforce-Ada
Public
An adaptive sampling framework for Reinforce-style LLM post training.
Python
•
Apache License 2.0
•16•88•0•0•Updated Nov 29, 2025Nov 29, 2025
Reinforce-Ada-Tinker
Public
An adaptive sampling framework for Reinforce-style LLM post training.
Python
•
Apache License 2.0
•0•8•0•0•Updated Oct 16, 2025Oct 16, 2025
GVM
Public
Python
•
Apache License 2.0
•0•16•1•0•Updated Jul 29, 2025Jul 29, 2025
RLHFlow.github.io
Public
Webpage for RLHFlow
HTML
•0•9•0•0•Updated Jun 20, 2025Jun 20, 2025
Minimal-RL
Public
Python
•
Apache License 2.0
•12•260•6•0•Updated May 14, 2025May 14, 2025
RLHF-Reward-Modeling
Public
Recipes to train reward model for RLHF.
llm rlhf reward-models llama3
Python
•
Apache License 2.0
•109•1.5k•19•2•Updated Apr 24, 2025Apr 24, 2025
Online-DPO-R1
Public
Codebase for Iterative DPO Using Rule-based Rewards
Python
•34•267•7•0•Updated Apr 11, 2025Apr 11, 2025
Self-rewarding-reasoning-LLM
Public
Recipes to train the self-rewarding reasoning LLMs.
Python
•12•229•2•0•Updated Mar 2, 2025Mar 2, 2025
Online-RLHF
Public
A recipe for online RLHF and online iterative DPO.
llm rlhf llama3
Python
•48•538•12•0•Updated Dec 28, 2024Dec 28, 2024
Directional-Preference-Alignment
Public
Directional Preference Alignment
ai-alignment large-language-models rlhf
Apache License 2.0
•3•58•2•0•Updated Sep 23, 2024Sep 23, 2024
RAFT
Public
This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or rejection sampling fine-tuning.
Python
•5•39•0•0•Updated Sep 22, 2024Sep 22, 2024
.github
Public
0•0•0•0•Updated May 26, 2024May 26, 2024