Skip to content

Commit 896f0f5

Browse files
authored
[RLlib] Remove algorithms from rllib (#30)
* Remove algorithms from rllib Signed-off-by: Avnish <[email protected]>
1 parent 2e1b695 commit 896f0f5

File tree

1 file changed

+202
-0
lines changed

1 file changed

+202
-0
lines changed
Lines changed: 202 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,202 @@
1+
## Summary
2+
3+
We'd like to create the `rllib_contrib` directory inside of rayproject/ray for community contributed algorithms and algorithms with low usage in RLlib. We'd like to start by migrating approximately ~25 of the 30 algorithms from RLlib into `rllib_contrib`. We are considering doing this because
4+
5+
1. Doing so will greatly increase the ability / lower the barrier of entry of community members to contribute new algorithms to RLlib.
6+
2. It would reduce the maintenance burden of RLlib. **By moving any algorithms from rllib into rllib_contrib, we are breaking the api surface between these algorithms and rllib. This means that we do not need to update them with any new features or provide any bug fixes to them.**
7+
3. Many of these algorithms [to be migrated to this directory] have successors that are more performant and/or easier to hyperparameter tune.
8+
4. The proposed algorithms have low usage by the community according to our ray cluster telemetry readings.
9+
10+
11+
Each deprecated algorithm will have its own subdirectory in this repo containing the following:
12+
13+
1. The implementation of the algorithm.
14+
2. Any tests that are associated with ensuring the correctness of the algorithm.
15+
3. End to end example scripts that demonstrate how to run the algorithm.
16+
4. The necessary requirements for running the algorithm (for example ray at a specific version).
17+
5. Boiler plate that allows the algorithm to be installed as a pip package.
18+
19+
20+
## Usage
21+
22+
Users will be able to replace the deprecated algorithms by doing the following:
23+
24+
In most usecases, the migration will look like replacing imports inside of an experiment script.
25+
26+
For example:
27+
28+
```python
29+
from gymnasium.wrappers import TimeLimit
30+
31+
import ray
32+
from ray import air
33+
from ray import tune
34+
from ray.rllib.examples.env.cartpole_mass import CartPoleMassEnv
35+
from ray.tune.registry import register_env
36+
37+
# from ray.rllib.algorithms.maml import MAML, MAMLConfig # BEFORE
38+
from rllib_maml.maml import MAML, MAMLConfig # AFTER
39+
40+
if __name__ == "__main__":
41+
ray.init()
42+
register_env(
43+
"cartpole",
44+
lambda env_cfg: TimeLimit(CartPoleMassEnv(), max_episode_steps=200),
45+
)
46+
47+
config = MAMLConfig()
48+
49+
tuner = tune.Tuner(
50+
MAML,
51+
param_space=config.to_dict(),
52+
run_config=air.RunConfig(stop={"training_iteration": 100}),
53+
)
54+
results = tuner.fit()
55+
56+
```
57+
58+
59+
The proposed algorithms to migrate from RLlib are outlined in the table below, along with the estimated:
60+
61+
| Algorithm | Soft Deprecation Release # | Hard Deprecation Release # |
62+
| --- | --- | --- |
63+
| A3C | 2.5 | 2.8 |
64+
| A2C | 2.6 | 2.9 |
65+
| R2D2 | 2.6 | 2.9 |
66+
| MAML | 2.5 | 2.8 |
67+
| AphaStar | 2.6 | 2.9 |
68+
| AlphaZero | 2.6 | 2.9 |
69+
| ApexDQN | 2.6 | 2.9 |
70+
| ApexDDPG | 2.6 | 2.9 |
71+
| DDPG | 2.6 | 2.9 |
72+
| ARS | 2.6 | 2.9 |
73+
| Bandits | 2.6 | 2.9 |
74+
| CRR | 2.6 | 2.9 |
75+
| DDPG | 2.6 | 2.9 |
76+
| DDPPO | 2.6 | 2.9 |
77+
| Dreamer | 2.6 | 2.9 |
78+
| DT | 2.6 | 2.9 |
79+
| ES | 2.6 | 2.9 |
80+
| Leelachess | 2.6 | 2.9 |
81+
| MADDPG | 2.6 | 2.9 |
82+
| MBMPO | 2.6 | 2.9 |
83+
| PG | 2.6 | 2.9 |
84+
| QMix | 2.6 | 2.9 |
85+
| Random | 2.6 | 2.9 |
86+
| SimpleQ | 2.6 | 2.9 |
87+
| SlateQ | 2.6 | 2.9 |
88+
| TD3 | 2.6 | 2.9 |
89+
90+
91+
## Design and Architecture
92+
93+
- We introduce the new directory for deprecated RLlib algorithms and community contributed algorithms.
94+
95+
### `rllib-contrib` File Structure
96+
97+
```
98+
README.md
99+
algorithm_foo/
100+
src/
101+
algorithm_foo/
102+
__init__.py
103+
algorithm_foo.py
104+
tests/
105+
examples/
106+
latest_results/
107+
results.json
108+
results.csv
109+
results.pkl
110+
results.tensorboard
111+
results.rst
112+
requirements.txt
113+
pyproject.toml <---------- Used for installation
114+
115+
```
116+
117+
### Installation
118+
119+
Either install from pypi:
120+
121+
` pip install rllib-contrib-ALGORITHM`
122+
123+
or install from source:
124+
125+
1. Use git to clone `rayproject/ray`.
126+
2. Navigate to the directory of the previously deprecated algorithm
127+
for ex. `cd rllib-contrib/maml`.
128+
3. Run `pip install -e .` to install the algorithm python module as a pip package.
129+
4. Import the algorithm, its policies and its config from that package instead of from `ray.rllib`.
130+
131+
132+
## Contribution Guidlines
133+
134+
135+
The RLlib team commits to the following level of support for the algorithms in this repo:
136+
137+
| Platform | Purpose | Support Level |
138+
| --- | --- | --- |
139+
| [Discuss Forum](https://discuss.ray.io) | For discussions about development and questions about usage. | Community |
140+
| [GitHub Issues](https://github.com/ray-project/rllib-contrib-maml/issues) | For reporting bugs and filing feature requests. | Community |
141+
| [Slack](https://forms.gle/9TSdDYUgxYs8SA9e8) | For collaborating with other Ray users. | Community |
142+
143+
**This means that any issues that are filed will be solved best-effort by the community and there is no expectation of maintenance by the RLlib team.**
144+
145+
We will generally accept contributions to this directory that meet any of the following criteria:
146+
1. Updating dependencies.
147+
2. Submitting community contributed algorithms that have been tested and are ready for use.
148+
3. Enabling algorithms to be run in different environments (ex. adding support for a new type of gym environment).
149+
4. Updating algorithms for use with the newer RLlib APIs.
150+
5. General bug fixes.
151+
152+
We will not accept contributions that generally add significant new maintenance burden. In this case users should instead make their own repo with their contribution, **using the same guidelines as this repo** and the RLlib team can help to market/promote it in the ray docs.
153+
154+
### Contributing new algorithms
155+
156+
If you would like to contribute a new algorithm to this directory, please follow the following steps:
157+
1. Create a new directory with the same structure as the other algorithms.
158+
2. Add a `README.md` file that describes the algorithm and its usecases.
159+
3. Create unit tests/shorter learning tests and long learning tests for the algorithm.
160+
4. Submit a PR and a RLlib maintainer will review it and help you set up your testing to integrate with the CI of this repo.
161+
162+
Regarding unit tests and long running tests:
163+
164+
- Unit tests are any tests that tests a sub component of an algorithm. For example tests that check the value of a loss function given some inputs.
165+
- Short learning tests should run an algorithm on an easy to learn environment for a short amount of time (e.g. ~3 minutes) and check that the algorithm is achieving some learning threshold (e.g. reward mean or loss).
166+
- Long learning tests should run an algorithm on a hard to learn environment (e.g.) for a long amount of time (e.g. ~1 hour) and check that the algorithm is achieving some learning threshold (e.g. reward mean or loss).
167+
168+
169+
### Telemetry and Promoting Algorithms to RLlib
170+
171+
In Ray we have telemetry features that allow us to track the usage of algorithms. We'll have to establish a similar telemetry system for this repo. If an algorithm shows considerable usage compared to the usage that we see in RLlib, we can consider promoting it to RLlib and moving the maintenance burden to the RLlib team.
172+
173+
It isn't crucial to add this telemetry system in for the initial release of this repo, but it is something that we will add in the future. This is because we don't expect the usage of this repo to be high initially.
174+
175+
### Testing
176+
177+
Testing will leverage the existing buildkite infrastructure that we have in the oss repository today. Because we are leveraging the existing oss testing infrastructure, maintenance will be shared between the RL and dev prod teams. We will set up separate buildkite jobs for each algorithm to be run anytime there is a change to that specific algorithm. Doing this will ensure that we don’t waste compute resources. These jobs will run short running unit tests and short learning tests for the algorithm. Additionally we’ll have separate buildkite jobs for long running learning tests that use many resources. These tests will be manually triggered only if we enable them by adding special phrases to commit messages.
178+
179+
180+
## Compatibility, Deprecation, and Migration Plan
181+
182+
### Compatibility
183+
The dependencies of each algorithm are hard pinned for each algorithm's package. This means that if a user wants to use a newer version of ray, the burden of migrating the algorithm to the new version of ray is on the user.
184+
185+
### Deprecation
186+
1. We will keep the deprecated algorithms' classes around in the repo for at least 3 releases. We will add standard deprecation warnings to the algorithms and their components, but after the 3 releases we will add deprecation errors any time an algorithm or its related policy/model components are used.
187+
188+
2. We will add thorough documentation to https://docs.ray.io/en/latest/rllib/rllib-algorithms.html that explains why the algorithms were deprecated and how to migrate to the new rllib contrib algorithm python packages.
189+
190+
191+
## Stewardship
192+
193+
### Shepherd of the Proposal
194+
195+
@sven1977
196+
197+
### Required Reviewers
198+
199+
@kouroshhakha, @gjoliver, @richardliaw, @gjoliver, @sven1977, @ArturNiederfahrenhorst
200+
201+
202+

0 commit comments

Comments
 (0)