Skip to content

Commit 38c80fc

Browse files
authored
[RLlib; docs] Docs do-over (new API stack): ConnectorV2 documentation (part III). (#54626)
1 parent 0cb017e commit 38c80fc

File tree

12 files changed

+531
-98
lines changed

12 files changed

+531
-98
lines changed

doc/source/rllib/connector-v2.rst

Lines changed: 20 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -2,16 +2,6 @@
22

33
.. _connector-v2-docs:
44

5-
ConnectorV2 and ConnectorV2 pipelines
6-
=====================================
7-
8-
.. toctree::
9-
:hidden:
10-
11-
env-to-module-connector
12-
13-
.. include:: /_includes/rllib/new_api_stack.rst
14-
155
.. grid:: 1 2 3 4
166
:gutter: 1
177
:class-container: container pb-3
@@ -32,6 +22,24 @@ ConnectorV2 and ConnectorV2 pipelines
3222

3323
Env-to-module pipelines
3424

25+
.. grid-item-card::
26+
:img-top: /rllib/images/connector_v2/learner_connector.svg
27+
:class-img-top: pt-2 w-75 d-block mx-auto fixed-height-img
28+
29+
.. button-ref:: learner-pipeline-docs
30+
31+
Learner connector pipelines
32+
33+
ConnectorV2 and ConnectorV2 pipelines
34+
=====================================
35+
36+
.. toctree::
37+
:hidden:
38+
39+
env-to-module-connector
40+
learner-connector
41+
42+
.. include:: /_includes/rllib/new_api_stack.rst
3543

3644
RLlib stores and transports all trajectory data in the form of :py:class:`~ray.rllib.env.single_agent_episode.SingleAgentEpisode`
3745
or :py:class:`~ray.rllib.env.multi_agent_episode.MultiAgentEpisode` objects.
@@ -66,8 +74,8 @@ Three ConnectorV2 pipeline types
6674
There are three different types of connector pipelines in RLlib:
6775

6876
1) :ref:`Env-to-module pipeline <env-to-module-pipeline-docs>`, which creates tensor batches for action computing forward passes.
69-
2) Module-to-env pipeline, which translates a model's output into RL environment actions.
70-
3) Learner connector pipeline, which creates the train batch for a model update.
77+
2) Module-to-env pipeline (documentation pending), which translates a model's output into RL environment actions.
78+
3) :ref:`Learner connector pipeline <learner-pipeline-docs>`, which creates the train batch for a model update.
7179

7280
The :py:class:`~ray.rllib.connectors.connector_v2.ConnectorV2` API is an extremely powerful tool for
7381
customizing your RLlib experiments and algorithms. It allows you to take full control over accessing, changing, and re-assembling
@@ -140,12 +148,10 @@ individual submodules' forward passes using the individual batches under the res
140148
See :ref:`here for how to write your own multi-module or multi-agent forward logic <implementing-custom-multi-rl-modules>`
141149
and override this default behavior of :py:class:`~ray.rllib.core.rl_module.multi_rl_module.MultiRLModule`.
142150

143-
144151
Finally, if you have a stateful :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule`, for example an LSTM, RLlib adds two additional
145152
default connector pieces to the pipeline, :py:class:`~ray.rllib.connectors.common.add_time_dim_to_batch_and_zero_pad.AddTimeDimToBatchAndZeroPad`
146153
and :py:class:`~ray.rllib.connectors.common.add_states_from_episodes_to_batch.AddStatesFromEpisodesToBatch`:
147154

148-
149155
.. figure:: images/connector_v2/pipeline_batch_phases_single_agent_w_states.svg
150156
:width: 900
151157
:align: left
@@ -160,7 +166,6 @@ and :py:class:`~ray.rllib.connectors.common.add_states_from_episodes_to_batch.Ad
160166
RLlib only adds the ``state_in`` values for the first timestep in each sequence and therefore also doesn't add a time dimension to the data in the
161167
``state_in`` column.
162168

163-
164169
.. note::
165170

166171
To change the zero-padded sequence length for the :py:class:`~ray.rllib.connectors.common.add_time_dim_to_batch_and_zero_pad.AddTimeDimToBatchAndZeroPad`

doc/source/rllib/env-to-module-connector.rst

Lines changed: 30 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,6 @@
22

33
.. _env-to-module-pipeline-docs:
44

5-
Env-to-module pipelines
6-
=======================
7-
8-
.. include:: /_includes/rllib/new_api_stack.rst
9-
105
.. grid:: 1 2 3 4
116
:gutter: 1
127
:class-container: container pb-3
@@ -27,10 +22,21 @@ Env-to-module pipelines
2722

2823
Env-to-module pipelines (this page)
2924

25+
.. grid-item-card::
26+
:img-top: /rllib/images/connector_v2/learner_connector.svg
27+
:class-img-top: pt-2 w-75 d-block mx-auto fixed-height-img
28+
29+
.. button-ref:: learner-pipeline-docs
3030

31-
One env-to-module pipeline resides on each :py:class:`~ray.rllib.env.env_runner.EnvRunner` and is responsible
32-
for handling the data flow from the `gymnasium.Env <https://gymnasium.farama.org/api/env/>`__ to
33-
the :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule`.
31+
Learner pipelines
32+
33+
Env-to-module pipelines
34+
=======================
35+
36+
.. include:: /_includes/rllib/new_api_stack.rst
37+
38+
On each :py:class:`~ray.rllib.env.env_runner.EnvRunner` resides one env-to-module pipeline
39+
responsible for handling the data flow from the `gymnasium.Env <https://gymnasium.farama.org/api/env/>`__ to the :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule`.
3440

3541
.. figure:: images/connector_v2/env_runner_connector_pipelines.svg
3642
:width: 1000
@@ -43,7 +49,7 @@ the :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule`.
4349
.. The module-to-env pipeline serves the other direction, converting the output of the :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule`, such as action logits and action distribution parameters, to actual actions understandable by the `gymnasium.Env <https://gymnasium.farama.org/api/env/>`__ and used in the env's next `step()` call.
4450
4551
The env-to-module pipeline, when called, performs transformations from a list of ongoing :ref:`Episode objects <single-agent-episode-docs>` to an
46-
RLModule-readable tensor batch and RLlib passes this generated batch as the first argument into the
52+
``RLModule``-readable tensor batch and RLlib passes this generated batch as the first argument into the
4753
:py:meth:`~ray.rllib.core.rl_module.rl_module.RLModule.forward_inference` or :py:meth:`~ray.rllib.core.rl_module.rl_module.RLModule.forward_exploration`
4854
methods of the :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule`, depending on your exploration settings.
4955

@@ -61,16 +67,16 @@ methods of the :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule`, dependi
6167
Default env-to-module behavior
6268
------------------------------
6369

64-
By default RLlib populates an env-to-module pipeline with the following built-in connector pieces.
70+
By default RLlib populates every env-to-module pipeline with the following built-in connector pieces.
6571

6672
* :py:class:`~ray.rllib.connectors.common.add_observations_from_episodes_to_batch.AddObservationsFromEpisodesToBatch`: Places the most recent observation from each ongoing episode into the batch. The column name is ``obs``. Note that if you have a vector of ``N`` environments per :py:class:`~ray.rllib.env.env_runner.EnvRunner`, your batch size is also ``N``.
67-
* *Relevant for stateful models only:* :py:class:`~ray.rllib.connectors.common.add_time_dim_to_batch_and_zero_pad.AddTimeDimToBatchAndZeroPad`: If the :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule` is a stateful one, adds a single timestep, second axis to all data to make it sequential.
68-
* *Relevant for stateful models only:* :py:class:`~ray.rllib.connectors.common.add_states_from_episodes_to_batch.AddStatesFromEpisodesToBatch`: If the :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule` is a stateful one, places the most recent state outputs of the module as new state inputs into the batch. The column name is ``state_in``.
73+
* *Relevant for stateful models only:* :py:class:`~ray.rllib.connectors.common.add_time_dim_to_batch_and_zero_pad.AddTimeDimToBatchAndZeroPad`: If the :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule` is stateful, adds a single timestep, second axis to all data to make it sequential.
74+
* *Relevant for stateful models only:* :py:class:`~ray.rllib.connectors.common.add_states_from_episodes_to_batch.AddStatesFromEpisodesToBatch`: If the :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule` is stateful, places the most recent state outputs of the module as new state inputs into the batch. The column name is ``state_in`` and the values don't have a time-dimension.
6975
* *For multi-agent only:* :py:class:`~ray.rllib.connectors.common.agent_to_module_mapping.AgentToModuleMapping`: Maps per-agent data to the respective per-module data depending on your defined agent-to-module mapping function.
7076
* :py:class:`~ray.rllib.connectors.common.batch_individual_items.BatchIndividualItems`: Converts all data in the batch, which thus far are lists of individual items, into batched structures meaning NumPy arrays, whose 0th axis is the batch axis.
7177
* :py:class:`~ray.rllib.connectors.common.numpy_to_tensor.NumpyToTensor`: Converts all NumPy arrays in the batch into framework specific tensors and moves these to the GPU, if required.
7278

73-
You can disable the preceding default connector pieces by setting `config.env_runners(add_default_connectors_to_env_to_module_pipeline=False)`
79+
You can disable all the preceding default connector pieces by setting `config.env_runners(add_default_connectors_to_env_to_module_pipeline=False)`
7480
in your :ref:`algorithm config <rllib-algo-configuration-docs>`.
7581

7682
Note that the order of these transforms is very relevant for the functionality of the pipeline.
@@ -252,7 +258,7 @@ Writing custom env-to-module connectors
252258
You can customize the default env-to-module pipeline that RLlib creates through specifying a function in your
253259
:py:class:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig`, which takes an optional RL environment object (`env`) and an optional `spaces`
254260
dictionary as input arguments and returns a single :py:class:`~ray.rllib.connectors.connector_v2.ConnectorV2` piece or a list thereof.
255-
RLlib prepends the provided :py:class:`~ray.rllib.connectors.connector_v2.ConnectorV2` instances to the
261+
RLlib prepends these :py:class:`~ray.rllib.connectors.connector_v2.ConnectorV2` instances to the
256262
:ref:`default env-to-module pipeline <default-env-to-module-pipeline>` in the order returned,
257263
unless you set `add_default_connectors_to_env_to_module_pipeline=False` in your config, in which case RLlib exclusively uses the provided
258264
:py:class:`~ray.rllib.connectors.connector_v2.ConnectorV2` pieces without any automatically added default behavior.
@@ -373,8 +379,8 @@ Now you can use the custom preprocessor in environments with integer observation
373379

374380
.. _observation-preprocessors-adding-rewards-to-obs:
375381

376-
Adding recent rewards to the batch
377-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
382+
Example: Adding recent rewards to the batch
383+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
378384

379385
Assume you wrote a custom :ref:`RLModule <rlmodule-guide>` that requires the last three received
380386
rewards as input in the calls to any of its `forward_..()` methods.
@@ -396,8 +402,8 @@ there are now three more values in each observation:
396402
from ray.rllib.connectors.env_to_module.observation_preprocessor import SingleAgentObservationPreprocessor
397403

398404

399-
class AddPast3Rewards(SingleAgentObservationPreprocessor):
400-
"""Extracts last 3 rewards from episode and concatenates them to the observation tensor."""
405+
class AddPastThreeRewards(SingleAgentObservationPreprocessor):
406+
"""Extracts last three rewards from episode and concatenates them to the observation tensor."""
401407

402408
def recompute_output_observation_space(self, in_obs_space, in_act_space):
403409
# Based on the input observation space (), return the output observation
@@ -432,8 +438,8 @@ there are now three more values in each observation:
432438
method.
433439

434440

435-
Preprocessing observations in multi-agent setups
436-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
441+
Example: Preprocessing observations in multi-agent setups
442+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
437443

438444
In multi-agent setups, you have two options for preprocessing your agents' individual observations
439445
through customizing your env-to-module pipeline:
@@ -480,8 +486,8 @@ through customizing your env-to-module pipeline:
480486
previous actions.
481487

482488

483-
Adding new columns to the batch
484-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
489+
Example: Adding new columns to the batch
490+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
485491

486492
So far, you have altered the observations in the input episodes, either by
487493
:ref:`manipulating them directly <observation-preprocessors>` or
@@ -574,3 +580,5 @@ You should see the new column in the batch, after running through this connector
574580
Note, though, that if your :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule` also requires the new information
575581
in the train batch, you would also need to add the same custom connector piece to your Algorithm's
576582
:py:class:`~ray.rllib.connectors.learner.learner_connector_pipeline.LearnerConnectorPipeline`.
583+
584+
See :ref:`the Learner connector pipeline documentation <learner-pipeline-docs>` for more details on how to customize it.

doc/source/rllib/images/connector_v2/custom_pieces_in_learner_pipeline.svg

Lines changed: 1 addition & 0 deletions
Loading

doc/source/rllib/images/connector_v2/frame_stacking_connector_setup.svg

Lines changed: 1 addition & 0 deletions
Loading

doc/source/rllib/images/connector_v2/learner_connector_pipeline.svg

Lines changed: 1 addition & 1 deletion
Loading

doc/source/rllib/images/connector_v2/location_of_connector_pipelines_in_rllib.svg

Lines changed: 1 addition & 1 deletion
Loading

0 commit comments

Comments
 (0)