News • Links • Getting Started • Evaluation • Citation • Acknowledgement
- [2025/05/20] 🎉 We released SkyRL-SQL: a multi-turn RL training pipeline for Text-to-SQL, along with SkyRL-SQL-7B — a model trained on just 653 samples that outperforms both GPT-4o and o4-mini!
- [2025/05/06] 🎉 We released SkyRL-v0: our open RL training pipeline for multi-turn tool use LLMs, optimized for long-horizon, real-environment tasks like SWE-Bench!
This repository contains training code for the SkyRL-v0 release. Our implementation is a fork of VeRL.
The repo is currently utilizing the SGLang async rollout feature introduced to VeRL in this draft PR, based on this commit. We will refactor the code soon so that the codebase can easily keep up with VeRL main branch.
The first step is to clone our repository:
git clone --recurse-submodules https://github.com/NovaSky-AI/SkyRLFor detailed installation instructions, please refer to INSTALL.md
For reproducing our results for SkyRL-Agent-14B-v0, SkyRL-Agent-8B-v0, and SkyRL-Agent-7B-v0 you can refer to examples/sky/swebench.
For reproducing our results for SkyRL-SQL-7B, you can refer to examples/sky/sql.
We report evaluation results of different downstream tasks as below.
We report the evaluation result on SWE-Bench-Verified below.
| Model | Base | Base Performance | Performance | Training Time |
|---|---|---|---|---|
| SkyRL-Agent-7B-v0 | OpenHands-7B-Agent | 11% | 14.6% | 16hrs 8xH100 |
| SkyRL-Agent-8B-v0 | Qwen3-8B no thinking | 3.6% | 9.4% | 27hrs 8xH200 |
| SkyRL-Agent-14B-v0 | Qwen3-14B thinking | 18% | 21.6% | 20hrs 8xH200 |
We report the evaluation result on a range of Spider benchmarks (evaluated in 5 turns) below.
| Model | Spider-Dev | Spider-Test | Spider-Realistic | Spider-DK | Spider-Syn | Avg |
|---|---|---|---|---|---|---|
| Qwen-2.5-Coder-7B-Instruct | 77.1 | 79.6 | 74.2 | 62.8 | 66.2 | 72.0 |
| o4-mini | 80.6 | 81.8 | 81.2 | 70.8 | 72.1 | 77.3 |
| GPT-4o | 81.3 | 82.4 | 80.1 | 72.1 | 71.9 | 77.6 |
| SkyRL-SQL-7B | 83.9 (+6.8%) | 85.2 (+5.6%) | 81.1 (+6.9%) | 72.0 (+9.2%) | 73.7 (+7.5%) | 79.2 (+7.2%) |
This work is done at Berkeley Sky Computing Lab, with the amazing compute support from Anyscale, Databricks, NVIDIA, Lambda Labs, and AMD.
Huge thanks to the contributors of the SGLang async rollout feature in VeRL: Hancheng Zhang, Rui Lu, Haoran Wang from Tsinghua University, Xiang Long from OpenBMB/ModelBest.
We would also like to thank Ying Sheng, Chenyang Zhao from SGLang team for supporting SGLang async rollout integration, and Kaichao You, Simon Mo from vLLM team for supporting vLLM performance optimization.
The code in this repository is mostly described in the post below. Please consider citing this work if you find the repository helpful.
@misc{cao2025skyrl,
title = {SkyRL-v0: Train Real-World Long-Horizon Agents via Reinforcement Learning},
author = {Shiyi Cao and Sumanth Hegde and Dacheng Li and Tyler Griggs and Shu Liu and Eric Tang and Jiayi Pan and Xingyao Wang and Akshay Malik and Graham Neubig and Kourosh Hakhamaneshi and Richard Liaw and Philipp Moritz and Matei Zaharia and Joseph E. Gonzalez and Ion Stoica},
year = {2025},
}@misc{liu2025skyrlsql,
title={SkyRL-SQL: Matching GPT-4o and o4-mini on Text2SQL with Multi-Turn RL},
author={Shu Liu and Sumanth Hegde and Shiyi Cao and Alan Zhu and Dacheng Li and Tyler Griggs and Eric Tang and Akshay Malik and Kourosh Hakhamaneshi and Richard Liaw and Philipp Moritz and Matei Zaharia and Joseph E. Gonzalez and Ion Stoica},
year={2025},
}