Skip to content
Change the repository type filter

All

    Repositories list

    • An adaptive sampling framework for Reinforce-style LLM post training.
      Python
      168800Updated Nov 29, 2025Nov 29, 2025
    • An adaptive sampling framework for Reinforce-style LLM post training.
      Python
      0800Updated Oct 16, 2025Oct 16, 2025
    • GVM

      Public
      Python
      01610Updated Jul 29, 2025Jul 29, 2025
    • RLHFlow.github.io

      Public
      Webpage for RLHFlow
      HTML
      0900Updated Jun 20, 2025Jun 20, 2025
    • Minimal-RL

      Public
      Python
      1226060Updated May 14, 2025May 14, 2025
    • Recipes to train reward model for RLHF.
      Python
      1091.5k192Updated Apr 24, 2025Apr 24, 2025
    • Codebase for Iterative DPO Using Rule-based Rewards
      Python
      3426770Updated Apr 11, 2025Apr 11, 2025
    • Recipes to train the self-rewarding reasoning LLMs.
      Python
      1222920Updated Mar 2, 2025Mar 2, 2025
    • A recipe for online RLHF and online iterative DPO.
      Python
      48538120Updated Dec 28, 2024Dec 28, 2024
    • Directional Preference Alignment
      35820Updated Sep 23, 2024Sep 23, 2024
    • RAFT

      Public
      This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or rejection sampling fine-tuning.
      Python
      53900Updated Sep 22, 2024Sep 22, 2024
    • .github

      Public
      0000Updated May 26, 2024May 26, 2024