You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -160,7 +166,6 @@ and :py:class:`~ray.rllib.connectors.common.add_states_from_episodes_to_batch.Ad
160
166
RLlib only adds the ``state_in`` values for the first timestep in each sequence and therefore also doesn't add a time dimension to the data in the
161
167
``state_in`` column.
162
168
163
-
164
169
.. note::
165
170
166
171
To change the zero-padded sequence length for the :py:class:`~ray.rllib.connectors.common.add_time_dim_to_batch_and_zero_pad.AddTimeDimToBatchAndZeroPad`
One env-to-module pipeline resides on each :py:class:`~ray.rllib.env.env_runner.EnvRunner` and is responsible
32
-
for handling the data flow from the `gymnasium.Env <https://gymnasium.farama.org/api/env/>`__ to
33
-
the :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule`.
31
+
Learner pipelines
32
+
33
+
Env-to-module pipelines
34
+
=======================
35
+
36
+
.. include:: /_includes/rllib/new_api_stack.rst
37
+
38
+
On each :py:class:`~ray.rllib.env.env_runner.EnvRunner` resides one env-to-module pipeline
39
+
responsible for handling the data flow from the `gymnasium.Env <https://gymnasium.farama.org/api/env/>`__ to the :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule`.
@@ -43,7 +49,7 @@ the :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule`.
43
49
.. The module-to-env pipeline serves the other direction, converting the output of the :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule`, such as action logits and action distribution parameters, to actual actions understandable by the `gymnasium.Env <https://gymnasium.farama.org/api/env/>`__ and used in the env's next `step()` call.
44
50
45
51
The env-to-module pipeline, when called, performs transformations from a list of ongoing :ref:`Episode objects <single-agent-episode-docs>` to an
46
-
RLModule-readable tensor batch and RLlib passes this generated batch as the first argument into the
52
+
``RLModule``-readable tensor batch and RLlib passes this generated batch as the first argument into the
47
53
:py:meth:`~ray.rllib.core.rl_module.rl_module.RLModule.forward_inference` or :py:meth:`~ray.rllib.core.rl_module.rl_module.RLModule.forward_exploration`
48
54
methods of the :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule`, depending on your exploration settings.
49
55
@@ -61,16 +67,16 @@ methods of the :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule`, dependi
61
67
Default env-to-module behavior
62
68
------------------------------
63
69
64
-
By default RLlib populates an env-to-module pipeline with the following built-in connector pieces.
70
+
By default RLlib populates every env-to-module pipeline with the following built-in connector pieces.
65
71
66
72
* :py:class:`~ray.rllib.connectors.common.add_observations_from_episodes_to_batch.AddObservationsFromEpisodesToBatch`: Places the most recent observation from each ongoing episode into the batch. The column name is ``obs``. Note that if you have a vector of ``N`` environments per :py:class:`~ray.rllib.env.env_runner.EnvRunner`, your batch size is also ``N``.
67
-
* *Relevant for stateful models only:* :py:class:`~ray.rllib.connectors.common.add_time_dim_to_batch_and_zero_pad.AddTimeDimToBatchAndZeroPad`: If the :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule` is a stateful one, adds a single timestep, second axis to all data to make it sequential.
68
-
* *Relevant for stateful models only:* :py:class:`~ray.rllib.connectors.common.add_states_from_episodes_to_batch.AddStatesFromEpisodesToBatch`: If the :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule` is a stateful one, places the most recent state outputs of the module as new state inputs into the batch. The column name is ``state_in``.
73
+
* *Relevant for stateful models only:* :py:class:`~ray.rllib.connectors.common.add_time_dim_to_batch_and_zero_pad.AddTimeDimToBatchAndZeroPad`: If the :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule` is stateful, adds a single timestep, second axis to all data to make it sequential.
74
+
* *Relevant for stateful models only:* :py:class:`~ray.rllib.connectors.common.add_states_from_episodes_to_batch.AddStatesFromEpisodesToBatch`: If the :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule` is stateful, places the most recent state outputs of the module as new state inputs into the batch. The column name is ``state_in`` and the values don't have a time-dimension.
69
75
* *For multi-agent only:* :py:class:`~ray.rllib.connectors.common.agent_to_module_mapping.AgentToModuleMapping`: Maps per-agent data to the respective per-module data depending on your defined agent-to-module mapping function.
70
76
* :py:class:`~ray.rllib.connectors.common.batch_individual_items.BatchIndividualItems`: Converts all data in the batch, which thus far are lists of individual items, into batched structures meaning NumPy arrays, whose 0th axis is the batch axis.
71
77
* :py:class:`~ray.rllib.connectors.common.numpy_to_tensor.NumpyToTensor`: Converts all NumPy arrays in the batch into framework specific tensors and moves these to the GPU, if required.
72
78
73
-
You can disable the preceding default connector pieces by setting `config.env_runners(add_default_connectors_to_env_to_module_pipeline=False)`
79
+
You can disable all the preceding default connector pieces by setting `config.env_runners(add_default_connectors_to_env_to_module_pipeline=False)`
74
80
in your :ref:`algorithm config <rllib-algo-configuration-docs>`.
75
81
76
82
Note that the order of these transforms is very relevant for the functionality of the pipeline.
0 commit comments