diff --git a/com.unity.ml-agents/Documentation~/API-Reference.md b/com.unity.ml-agents/Documentation~/API-Reference.md new file mode 100644 index 0000000000..556fbc8b5d --- /dev/null +++ b/com.unity.ml-agents/Documentation~/API-Reference.md @@ -0,0 +1,20 @@ +# API Reference + +Our developer-facing C# classes have been documented to be compatible with +Doxygen for auto-generating HTML documentation. + +To generate the API reference, download Doxygen and run the following command +within the `docs/` directory: + +```sh +doxygen dox-ml-agents.conf +``` + +`dox-ml-agents.conf` is a Doxygen configuration file for the ML-Agents Toolkit +that includes the classes that have been properly formatted. The generated HTML +files will be placed in the `html/` subdirectory. Open `index.html` within that +subdirectory to navigate to the API reference home. Note that `html/` is already +included in the repository's `.gitignore` file. + +In the near future, we aim to expand our documentation to include the Python +classes. diff --git a/com.unity.ml-agents/Documentation~/Background-Machine-Learning.md b/com.unity.ml-agents/Documentation~/Background-Machine-Learning.md new file mode 100644 index 0000000000..e64011efdc --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Background-Machine-Learning.md @@ -0,0 +1,195 @@ +# Background: Machine Learning + +Given that a number of users of the ML-Agents Toolkit might not have a formal +machine learning background, this page provides an overview to facilitate the +understanding of the ML-Agents Toolkit. However, we will not attempt to provide +a thorough treatment of machine learning as there are fantastic resources +online. + +Machine learning, a branch of artificial intelligence, focuses on learning +patterns from data. The three main classes of machine learning algorithms +include: unsupervised learning, supervised learning and reinforcement learning. +Each class of algorithm learns from a different type of data. The following +paragraphs provide an overview for each of these classes of machine learning, as +well as introductory examples. + +## Unsupervised Learning + +The goal of +[unsupervised learning](https://en.wikipedia.org/wiki/Unsupervised_learning) is +to group or cluster similar items in a data set. For example, consider the +players of a game. We may want to group the players depending on how engaged +they are with the game. This would enable us to target different groups (e.g. +for highly-engaged players we might invite them to be beta testers for new +features, while for unengaged players we might email them helpful tutorials). +Say that we wish to split our players into two groups. We would first define +basic attributes of the players, such as the number of hours played, total money +spent on in-app purchases and number of levels completed. We can then feed this +data set (three attributes for every player) to an unsupervised learning +algorithm where we specify the number of groups to be two. The algorithm would +then split the data set of players into two groups where the players within each +group would be similar to each other. Given the attributes we used to describe +each player, in this case, the output would be a split of all the players into +two groups, where one group would semantically represent the engaged players and +the second group would semantically represent the unengaged players. + +With unsupervised learning, we did not provide specific examples of which +players are considered engaged and which are considered unengaged. We just +defined the appropriate attributes and relied on the algorithm to uncover the +two groups on its own. This type of data set is typically called an unlabeled +data set as it is lacking these direct labels. Consequently, unsupervised +learning can be helpful in situations where these labels can be expensive or +hard to produce. In the next paragraph, we overview supervised learning +algorithms which accept input labels in addition to attributes. + +## Supervised Learning + +In [supervised learning](https://en.wikipedia.org/wiki/Supervised_learning), we +do not want to just group similar items but directly learn a mapping from each +item to the group (or class) that it belongs to. Returning to our earlier +example of clustering players, let's say we now wish to predict which of our +players are about to churn (that is stop playing the game for the next 30 days). +We can look into our historical records and create a data set that contains +attributes of our players in addition to a label indicating whether they have +churned or not. Note that the player attributes we use for this churn prediction +task may be different from the ones we used for our earlier clustering task. We +can then feed this data set (attributes **and** label for each player) into a +supervised learning algorithm which would learn a mapping from the player +attributes to a label indicating whether that player will churn or not. The +intuition is that the supervised learning algorithm will learn which values of +these attributes typically correspond to players who have churned and not +churned (for example, it may learn that players who spend very little and play +for very short periods will most likely churn). Now given this learned model, we +can provide it the attributes of a new player (one that recently started playing +the game) and it would output a _predicted_ label for that player. This +prediction is the algorithms expectation of whether the player will churn or +not. We can now use these predictions to target the players who are expected to +churn and entice them to continue playing the game. + +As you may have noticed, for both supervised and unsupervised learning, there +are two tasks that need to be performed: attribute selection and model +selection. Attribute selection (also called feature selection) pertains to +selecting how we wish to represent the entity of interest, in this case, the +player. Model selection, on the other hand, pertains to selecting the algorithm +(and its parameters) that perform the task well. Both of these tasks are active +areas of machine learning research and, in practice, require several iterations +to achieve good performance. + +We now switch to reinforcement learning, the third class of machine learning +algorithms, and arguably the one most relevant for the ML-Agents Toolkit. + +## Reinforcement Learning + +[Reinforcement learning](https://en.wikipedia.org/wiki/Reinforcement_learning) +can be viewed as a form of learning for sequential decision making that is +commonly associated with controlling robots (but is, in fact, much more +general). Consider an autonomous firefighting robot that is tasked with +navigating into an area, finding the fire and neutralizing it. At any given +moment, the robot perceives the environment through its sensors (e.g. camera, +heat, touch), processes this information and produces an action (e.g. move to +the left, rotate the water hose, turn on the water). In other words, it is +continuously making decisions about how to interact in this environment given +its view of the world (i.e. sensors input) and objective (i.e. neutralizing the +fire). Teaching a robot to be a successful firefighting machine is precisely +what reinforcement learning is designed to do. + +More specifically, the goal of reinforcement learning is to learn a **policy**, +which is essentially a mapping from **observations** to **actions**. An +observation is what the robot can measure from its **environment** (in this +case, all its sensory inputs) and an action, in its most raw form, is a change +to the configuration of the robot (e.g. position of its base, position of its +water hose and whether the hose is on or off). + +The last remaining piece of the reinforcement learning task is the **reward +signal**. The robot is trained to learn a policy that maximizes its overall rewards. When training a robot to be a mean firefighting machine, we provide it +with rewards (positive and negative) indicating how well it is doing on +completing the task. Note that the robot does not _know_ how to put out fires +before it is trained. It learns the objective because it receives a large +positive reward when it puts out the fire and a small negative reward for every +passing second. The fact that rewards are sparse (i.e. may not be provided at +every step, but only when a robot arrives at a success or failure situation), is +a defining characteristic of reinforcement learning and precisely why learning +good policies can be difficult (and/or time-consuming) for complex environments. + +
The reinforcement learning lifecycle.
+ +[Learning a policy](https://blogs.unity3d.com/2017/08/22/unity-ai-reinforcement-learning-with-q-learning/) +usually requires many trials and iterative policy updates. More specifically, +the robot is placed in several fire situations and over time learns an optimal +policy which allows it to put out fires more effectively. Obviously, we cannot +expect to train a robot repeatedly in the real world, particularly when fires +are involved. This is precisely why the use of +[Unity as a simulator](https://blogs.unity3d.com/2018/01/23/designing-safer-cities-through-simulations/) +serves as the perfect training grounds for learning such behaviors. While our +discussion of reinforcement learning has centered around robots, there are +strong parallels between robots and characters in a game. In fact, in many ways, +one can view a non-playable character (NPC) as a virtual robot, with its own +observations about the environment, its own set of actions and a specific +objective. Thus it is natural to explore how we can train behaviors within Unity +using reinforcement learning. This is precisely what the ML-Agents Toolkit +offers. The video linked below includes a reinforcement learning demo showcasing +training character behaviors using the ML-Agents Toolkit. + +

+ + RL Demo + +

+ +Similar to both unsupervised and supervised learning, reinforcement learning +also involves two tasks: attribute selection and model selection. Attribute +selection is defining the set of observations for the robot that best help it +complete its objective, while model selection is defining the form of the policy +(mapping from observations to actions) and its parameters. In practice, training +behaviors is an iterative process that may require changing the attribute and +model choices. + +## Training and Inference + +One common aspect of all three branches of machine learning is that they all +involve a **training phase** and an **inference phase**. While the details of +the training and inference phases are different for each of the three, at a +high-level, the training phase involves building a model using the provided +data, while the inference phase involves applying this model to new, previously +unseen, data. More specifically: + +- For our unsupervised learning example, the training phase learns the optimal + two clusters based on the data describing existing players, while the + inference phase assigns a new player to one of these two clusters. +- For our supervised learning example, the training phase learns the mapping + from player attributes to player label (whether they churned or not), and the + inference phase predicts whether a new player will churn or not based on that + learned mapping. +- For our reinforcement learning example, the training phase learns the optimal + policy through guided trials, and in the inference phase, the agent observes + and takes actions in the wild using its learned policy. + +To briefly summarize: all three classes of algorithms involve training and +inference phases in addition to attribute and model selections. What ultimately +separates them is the type of data available to learn from. In unsupervised +learning our data set was a collection of attributes, in supervised learning our +data set was a collection of attribute-label pairs, and, lastly, in +reinforcement learning our data set was a collection of +observation-action-reward tuples. + +## Deep Learning + +[Deep learning](https://en.wikipedia.org/wiki/Deep_learning) is a family of +algorithms that can be used to address any of the problems introduced above. +More specifically, they can be used to solve both attribute and model selection +tasks. Deep learning has gained popularity in recent years due to its +outstanding performance on several challenging machine learning tasks. One +example is [AlphaGo](https://en.wikipedia.org/wiki/AlphaGo), a +[computer Go](https://en.wikipedia.org/wiki/Computer_Go) program, that leverages +deep learning, that was able to beat Lee Sedol (a Go world champion). + +A key characteristic of deep learning algorithms is their ability to learn very +complex functions from large amounts of training data. This makes them a natural +choice for reinforcement learning tasks when a large amount of data can be +generated, say through the use of a simulator or engine such as Unity. By +generating hundreds of thousands of simulations of the environment within Unity, +we can learn policies for very complex environments (a complex environment is +one where the number of observations an agent perceives and the number of +actions they can take are large). Many of the algorithms we provide in ML-Agents +use some form of deep learning, built on top of the open-source library, +[PyTorch](Background-PyTorch.md). diff --git a/com.unity.ml-agents/Documentation~/Background-PyTorch.md b/com.unity.ml-agents/Documentation~/Background-PyTorch.md new file mode 100644 index 0000000000..b78e77c558 --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Background-PyTorch.md @@ -0,0 +1,35 @@ +# Background: PyTorch + +As discussed in our +[machine learning background page](Background-Machine-Learning.md), many of the +algorithms we provide in the ML-Agents Toolkit leverage some form of deep +learning. More specifically, our implementations are built on top of the +open-source library [PyTorch](https://pytorch.org/). In this page we +provide a brief overview of PyTorch and TensorBoard +that we leverage within the ML-Agents Toolkit. + +## PyTorch + +[PyTorch](https://pytorch.org/) is an open source library for +performing computations using data flow graphs, the underlying representation of +deep learning models. It facilitates training and inference on CPUs and GPUs in +a desktop, server, or mobile device. Within the ML-Agents Toolkit, when you +train the behavior of an agent, the output is a model (.onnx) file that you can +then associate with an Agent. Unless you implement a new algorithm, the use of +PyTorch is mostly abstracted away and behind the scenes. + +## TensorBoard + +One component of training models with PyTorch is setting the values of +certain model attributes (called _hyperparameters_). Finding the right values of +these hyperparameters can require a few iterations. Consequently, we leverage a +visualization tool called +[TensorBoard](https://www.tensorflow.org/tensorboard). +It allows the visualization of certain agent attributes (e.g. reward) throughout +training which can be helpful in both building intuitions for the different +hyperparameters and setting the optimal values for your Unity environment. We +provide more details on setting the hyperparameters in the +[Training ML-Agents](Training-ML-Agents.md) page. If you are unfamiliar with +TensorBoard we recommend our guide on +[using TensorBoard with ML-Agents](Using-Tensorboard.md) or this +[tutorial](https://github.com/dandelionmane/tf-dev-summit-tensorboard-tutorial). diff --git a/com.unity.ml-agents/Documentation~/Background-Unity.md b/com.unity.ml-agents/Documentation~/Background-Unity.md new file mode 100644 index 0000000000..7d144e53da --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Background-Unity.md @@ -0,0 +1,19 @@ +# Background: Unity + +If you are not familiar with the [Unity Engine](https://unity3d.com/unity), we +highly recommend the [Unity Manual](https://docs.unity3d.com/Manual/index.html) +and [Tutorials page](https://unity3d.com/learn/tutorials). The +[Roll-a-ball tutorial](https://learn.unity.com/project/roll-a-ball) +is a fantastic resource to learn all the basic concepts of Unity to get started +with the ML-Agents Toolkit: + +- [Editor](https://docs.unity3d.com/Manual/UsingTheEditor.html) +- [Scene](https://docs.unity3d.com/Manual/CreatingScenes.html) +- [GameObject](https://docs.unity3d.com/Manual/GameObjects.html) +- [Rigidbody](https://docs.unity3d.com/ScriptReference/Rigidbody.html) +- [Camera](https://docs.unity3d.com/Manual/Cameras.html) +- [Scripting](https://docs.unity3d.com/Manual/ScriptingSection.html) +- [Physics](https://docs.unity3d.com/Manual/PhysicsSection.html) +- [Ordering of event functions](https://docs.unity3d.com/Manual/ExecutionOrder.html) + (e.g. FixedUpdate, Update) +- [Prefabs](https://docs.unity3d.com/Manual/Prefabs.html) diff --git a/com.unity.ml-agents/Documentation~/CODE_OF_CONDUCT.md b/com.unity.ml-agents/Documentation~/CODE_OF_CONDUCT.md new file mode 100644 index 0000000000..14a6a4a839 --- /dev/null +++ b/com.unity.ml-agents/Documentation~/CODE_OF_CONDUCT.md @@ -0,0 +1 @@ +{!../CODE_OF_CONDUCT.md!} diff --git a/com.unity.ml-agents/Documentation~/CONTRIBUTING.md b/com.unity.ml-agents/Documentation~/CONTRIBUTING.md new file mode 100644 index 0000000000..6629acde5f --- /dev/null +++ b/com.unity.ml-agents/Documentation~/CONTRIBUTING.md @@ -0,0 +1,36 @@ +# How to Contribute to ML-Agents + +## 1.Fork the repository +Fork the ML-Agents repository by clicking on the "Fork" button in the top right corner of the GitHub page. This creates a copy of the repository under your GitHub account. + +## 2. Set up your development environment +Clone the forked repository to your local machine using Git. Install the necessary dependencies and follow the instructions provided in the project's documentation to set up your development environment properly. + +## 3. Choose an issue or feature +Browse the project's issue tracker or discussions to find an open issue or feature that you would like to contribute to. Read the guidelines and comments associated with the issue to understand the requirements and constraints. + +## 4. Make your changes +Create a new branch for your changes based on the main branch of the ML-Agents repository. Implement your code changes or add new features as necessary. Ensure that your code follows the project's coding style and conventions. + +* Example: Let's say you want to add support for a new type of reward function in the ML-Agents framework. You can create a new branch named feature/reward-function to implement this feature. + +## 5. Test your changes +Run the appropriate tests to ensure your changes work as intended. If necessary, add new tests to cover your code and verify that it doesn't introduce regressions. + +* Example: For the reward function feature, you would write tests to check different scenarios and expected outcomes of the new reward function. + +## 6. Submit a pull request +Push your branch to your forked repository and submit a pull request (PR) to the ML-Agents main repository. Provide a clear and concise description of your changes, explaining the problem you solved or the feature you added. + +* Example: In the pull request description, you would explain how the new reward function works, its benefits, and any relevant implementation details. + +## 7. Respond to feedback +Be responsive to any feedback or comments provided by the project maintainers. Address the feedback by making necessary revisions to your code and continue the discussion if required. + +## 8. Continuous integration and code review +The ML-Agents project utilizes automated continuous integration (CI) systems to run tests on pull requests. Address any issues flagged by the CI system and actively participate in the code review process by addressing comments from reviewers. + +## 9. Merge your changes +Once your pull request has been approved and meets all the project's requirements, a project maintainer will merge your changes into the main repository. Congratulations, your contribution has been successfully integrated! + +**Remember to always adhere to the project's code of conduct, be respectful, and follow any specific contribution guidelines provided by the ML-Agents project. Happy contributing!** diff --git a/com.unity.ml-agents/Documentation~/Custom-GridSensors.md b/com.unity.ml-agents/Documentation~/Custom-GridSensors.md new file mode 100644 index 0000000000..88eb8be579 --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Custom-GridSensors.md @@ -0,0 +1,74 @@ +# Custom Grid Sensors + +Grid Sensor provides a 2D observation that detects objects around an agent from a top-down view. Compared to RayCasts, it receives a full observation in a grid area without gaps, and the detection is not blocked by objects around the agents. This gives a more granular view while requiring a higher usage of compute resources. + +One extra feature with Grid Sensors is that you can derive from the Grid Sensor base class to collect custom data besides the object tags, to include custom attributes as observations. This allows more flexibility for the use of GridSensor. + +## Creating Custom Grid Sensors +To create a custom grid sensor, you'll need to derive from two classes: `GridSensorBase` and `GridSensorComponent`. + +## Deriving from `GridSensorBase` +This is the implementation of your sensor. This defines how your sensor process detected colliders, +what the data looks like, and how the observations are constructed from the detected objects. +Consider overriding the following methods depending on your use case: +* `protected virtual int GetCellObservationSize()`: Return the observation size per cell. Default to `1`. +* `protected virtual void GetObjectData(GameObject detectedObject, int tagIndex, float[] dataBuffer)`: Constructs observations from the detected object. The input provides the detected GameObject and the index of its tag (0-indexed). The observations should be written to the given `dataBuffer` and the buffer size is defined in `GetCellObservationSize()`. This data will be gathered from each cell and sent to the trainer as observation. +* `protected virtual bool IsDataNormalized()`: Return whether the observation is normalized to 0~1. This affects whether you're able to use compressed observations as compressed data only supports normalized data. Return `true` if all the values written in `GetObjectData` are within the range of (0, 1), otherwise return `false`. Default to `false`. + + There might be cases when your data is not in the range of (0, 1) but you still wish to use compressed data to speed up training. If your data is naturally bounded within a range, normalize your data first to the possible range and fill the buffer with normalized data. For example, since the angle of rotation is bounded within `0 ~ 360`, record an angle `x` as `x/360` instead of `x`. If your data value is not bounded (position, velocity, etc.), consider setting a reasonable min/max value and use that to normalize your data. +* `protected internal virtual ProcessCollidersMethod GetProcessCollidersMethod()`: Return the method to process colliders detected in a cell. This defines the sensor behavior when multiple objects with detectable tags are detected within a cell. + Currently two methods are provided: + * `ProcessCollidersMethod.ProcessClosestColliders` (Default): Process the closest collider to the agent. In this case each cell's data is represented by one object. + * `ProcessCollidersMethod.ProcessAllColliders`: Process all detected colliders. This is useful when the data from each cell is additive, for instance, the count of detected objects in a cell. When using this option, the input `dataBuffer` in `GetObjectData()` will contain processed data from other colliders detected in the cell. You'll more likely want to add/subtract values from the buffer instead of overwrite it completely. + +## Deriving from `GridSensorComponent` +To create your sensor, you need to override the sensor component and add your sensor to the creation. +Specifically, you need to override `GetGridSensors()` and return an array of grid sensors you want to use in the component. +It can be used to create multiple different customized grid sensors, or you can also include the ones provided in our package (listed in the next section). + +Example: +```csharp +public class CustomGridSensorComponent : GridSensorComponent +{ + protected override GridSensorBase[] GetGridSensors() + { + return new GridSensorBase[] { new CustomGridSensor(...)}; + } +} +``` + +## Grid Sensor Types +Here we list out two types of grid sensor provided in the package: `OneHotGridSensor` and `CountingGridSensor`. +Their implementations are also a good reference for making you own ones. + +### OneHotGridSensor +This is the default sensor used by `GridSensorComponent`. It detects objects with detectable tags and the observation is the one-hot representation of the detected tag index. + +The implementation of the sensor is defined as following: +* `GetCellObservationSize()`: `detectableTags.Length` +* `IsDataNormalized()`: `true` +* `ProcessCollidersMethod()`: `ProcessCollidersMethod.ProcessClosestColliders` +* `GetObjectData()`: + +```csharp +protected override void GetObjectData(GameObject detectedObject, int tagIndex, float[] dataBuffer) +{ + dataBuffer[tagIndex] = 1; +} +``` + +### CountingGridSensor +This is an example of using all colliders detected in a cell. It counts the number of objects detected for each detectable tag. The sensor cannot be used with data compression. + +The implementation of the sensor is defined as following: +* `GetCellObservationSize()`: `detectableTags.Length` +* `IsDataNormalized()`: `false` +* `ProcessCollidersMethod()`: `ProcessCollidersMethod.ProcessAllColliders` +* `GetObjectData()`: + +```csharp +protected override void GetObjectData(GameObject detectedObject, int tagIndex, float[] dataBuffer) +{ + dataBuffer[tagIndex] += 1; +} +``` diff --git a/com.unity.ml-agents/Documentation~/Custom-SideChannels.md b/com.unity.ml-agents/Documentation~/Custom-SideChannels.md new file mode 100644 index 0000000000..dab58c3d79 --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Custom-SideChannels.md @@ -0,0 +1,229 @@ +# Custom Side Channels + +You can create your own side channel in C# and Python and use it to communicate +custom data structures between the two. This can be useful for situations in +which the data to be sent is too complex or structured for the built-in +`EnvironmentParameters`, or is not related to any specific agent, and therefore +inappropriate as an agent observation. + +## Overview + +In order to use a side channel, it must be implemented as both Unity and Python +classes. + +### Unity side + +The side channel will have to implement the `SideChannel` abstract class and the +following method. + +- `OnMessageReceived(IncomingMessage msg)` : You must implement this method and + read the data from IncomingMessage. The data must be read in the order that it + was written. + +The side channel must also assign a `ChannelId` property in the constructor. The +`ChannelId` is a Guid (or UUID in Python) used to uniquely identify a side +channel. This Guid must be the same on C# and Python. There can only be one side +channel of a certain id during communication. + +To send data from C# to Python, create an `OutgoingMessage` instance, add data +to it, call the `base.QueueMessageToSend(msg)` method inside the side channel, +and call the `OutgoingMessage.Dispose()` method. + +To register a side channel on the Unity side, call +`SideChannelManager.RegisterSideChannel` with the side channel as only argument. + +### Python side + +The side channel will have to implement the `SideChannel` abstract class. You +must implement : + +- `on_message_received(self, msg: "IncomingMessage") -> None` : You must + implement this method and read the data from IncomingMessage. The data must be + read in the order that it was written. + +The side channel must also assign a `channel_id` property in the constructor. +The `channel_id` is a UUID (referred in C# as Guid) used to uniquely identify a +side channel. This number must be the same on C# and Python. There can only be +one side channel of a certain id during communication. + +To assign the `channel_id` call the abstract class constructor with the +appropriate `channel_id` as follows: + +```python +super().__init__(my_channel_id) +``` + +To send a byte array from Python to C#, create an `OutgoingMessage` instance, +add data to it, and call the `super().queue_message_to_send(msg)` method inside +the side channel. + +To register a side channel on the Python side, pass the side channel as argument +when creating the `UnityEnvironment` object. One of the arguments of the +constructor (`side_channels`) is a list of side channels. + +## Example implementation + +Below is a simple implementation of a side channel that will exchange ASCII +encoded strings between a Unity environment and Python. + +### Example Unity C# code + +The first step is to create the `StringLogSideChannel` class within the Unity +project. Here is an implementation of a `StringLogSideChannel` that will listen +for messages from python and print them to the Unity debug log, as well as send +error messages from Unity to python. + +```csharp +using UnityEngine; +using Unity.MLAgents; +using Unity.MLAgents.SideChannels; +using System.Text; +using System; + +public class StringLogSideChannel : SideChannel +{ + public StringLogSideChannel() + { + ChannelId = new Guid("621f0a70-4f87-11ea-a6bf-784f4387d1f7"); + } + + protected override void OnMessageReceived(IncomingMessage msg) + { + var receivedString = msg.ReadString(); + Debug.Log("From Python : " + receivedString); + } + + public void SendDebugStatementToPython(string logString, string stackTrace, LogType type) + { + if (type == LogType.Error) + { + var stringToSend = type.ToString() + ": " + logString + "\n" + stackTrace; + using (var msgOut = new OutgoingMessage()) + { + msgOut.WriteString(stringToSend); + QueueMessageToSend(msgOut); + } + } + } +} +``` + +Once we have defined our custom side channel class, we need to ensure that it is +instantiated and registered. This can typically be done wherever the logic of +the side channel makes sense to be associated, for example on a MonoBehaviour +object that might need to access data from the side channel. Here we show a +simple MonoBehaviour object which instantiates and registers the new side +channel. If you have not done it already, make sure that the MonoBehaviour which +registers the side channel is attached to a GameObject which will be live in +your Unity scene. + +```csharp +using UnityEngine; +using Unity.MLAgents; + + +public class RegisterStringLogSideChannel : MonoBehaviour +{ + + StringLogSideChannel stringChannel; + public void Awake() + { + // We create the Side Channel + stringChannel = new StringLogSideChannel(); + + // When a Debug.Log message is created, we send it to the stringChannel + Application.logMessageReceived += stringChannel.SendDebugStatementToPython; + + // The channel must be registered with the SideChannelManager class + SideChannelManager.RegisterSideChannel(stringChannel); + } + + public void OnDestroy() + { + // De-register the Debug.Log callback + Application.logMessageReceived -= stringChannel.SendDebugStatementToPython; + if (Academy.IsInitialized){ + SideChannelManager.UnregisterSideChannel(stringChannel); + } + } + + public void Update() + { + // Optional : If the space bar is pressed, raise an error ! + if (Input.GetKeyDown(KeyCode.Space)) + { + Debug.LogError("This is a fake error. Space bar was pressed in Unity."); + } + } +} +``` + +### Example Python code + +Now that we have created the necessary Unity C# classes, we can create their +Python counterparts. + +```python +from mlagents_envs.environment import UnityEnvironment +from mlagents_envs.side_channel.side_channel import ( + SideChannel, + IncomingMessage, + OutgoingMessage, +) +import numpy as np +import uuid + + +# Create the StringLogChannel class +class StringLogChannel(SideChannel): + + def __init__(self) -> None: + super().__init__(uuid.UUID("621f0a70-4f87-11ea-a6bf-784f4387d1f7")) + + def on_message_received(self, msg: IncomingMessage) -> None: + """ + Note: We must implement this method of the SideChannel interface to + receive messages from Unity + """ + # We simply read a string from the message and print it. + print(msg.read_string()) + + def send_string(self, data: str) -> None: + # Add the string to an OutgoingMessage + msg = OutgoingMessage() + msg.write_string(data) + # We call this method to queue the data we want to send + super().queue_message_to_send(msg) +``` + +We can then instantiate the new side channel, launch a `UnityEnvironment` with +that side channel active, and send a series of messages to the Unity environment +from Python using it. + +```python +# Create the channel +string_log = StringLogChannel() + +# We start the communication with the Unity Editor and pass the string_log side channel as input +env = UnityEnvironment(side_channels=[string_log]) +env.reset() +string_log.send_string("The environment was reset") + +group_name = list(env.behavior_specs.keys())[0] # Get the first group_name +group_spec = env.behavior_specs[group_name] +for i in range(1000): + decision_steps, terminal_steps = env.get_steps(group_name) + # We send data to Unity : A string with the number of Agent at each + string_log.send_string( + f"Step {i} occurred with {len(decision_steps)} deciding agents and " + f"{len(terminal_steps)} terminal agents" + ) + env.step() # Move the simulation forward + +env.close() +``` + +Now, if you run this script and press `Play` the Unity Editor when prompted, the +console in the Unity Editor will display a message at every Python step. +Additionally, if you press the Space Bar in the Unity Engine, a message will +appear in the terminal. diff --git a/com.unity.ml-agents/Documentation~/ELO-Rating-System.md b/com.unity.ml-agents/Documentation~/ELO-Rating-System.md new file mode 100644 index 0000000000..df22dc53fe --- /dev/null +++ b/com.unity.ml-agents/Documentation~/ELO-Rating-System.md @@ -0,0 +1,60 @@ +# ELO Rating System +In adversarial games, the cumulative environment reward may **not be a meaningful metric** by which to track +learning progress. + +This is because the cumulative reward is **entirely dependent on the skill of the opponent**. + +An agent at a particular skill level will get more or less reward against a worse or better agent, +respectively. + +Instead, it's better to use ELO rating system, a method to calculate **the relative skill level between two players in a zero-sum game**. + +If the training performs correctly, **this value should steadily increase**. + +## What is a zero-sum game? +A zero-sum game is a game where **each player's gain or loss of utility is exactly balanced by the gain or loss of the utility of the opponent**. + +Simply explained, we face a zero-sum game **when one agent gets +1.0, its opponent gets -1.0 reward**. + +For instance, Tennis is a zero-sum game: if you win the point you get +1.0 and your opponent gets -1.0 reward. + +## How works the ELO Rating System +- Each player **has an initial ELO score**. It's defined in the `initial_elo` trainer config hyperparameter. + +- The **difference in rating between the two players** serves as the predictor of the outcomes of a match. + +![Example Elo](images/elo_example.png) +*For instance, if player A has an Elo score of 2100 and player B has an ELO score of 1800 the chance that player A wins is 85% against 15% for player b.* + +- We calculate the **expected score of each player** using this formula: + +![Elo Expected Score Formula](images/elo_expected_score_formula.png) + +- At the end of the game, based on the outcome **we update the player’s actual Elo score**, we use a linear adjustment proportional to the amount by which the player over-performed or under-performed. +The winning player takes points from the losing one: + - If the *higher-rated player wins* → **a few points** will be taken from the lower-rated player. + - If the *lower-rated player wins* → **a lot of points** will be taken from the high-rated player. + - If it’s *a draw* → the lower-rated player gains **a few points** from higher. + +- We update players rating using this formula: + +![Elo Score Update Formula](images/elo_score_update_formula.png) + +### The Tennis example + +- We start to train our agents. +- Both of them have the same skills. So ELO score for each of them that we defined using parameter `initial_elo = 1200.0`. + +We calculate the expected score E: +Ea = 0.5 +Eb = 0.5 + +So it means that each player has 50% chances of winning the point. + +If A wins, the new rating R would be: + +Ra = 1200 + 16 * (1 - 0.5) → 1208 + +Rb = 1200 + 16 * (0 - 0.5) → 1192 + +Player A has now an ELO score of 1208 and Player B an ELO score of 1192. Therefore, Player A is now a little bit **better than Player B**. diff --git a/com.unity.ml-agents/Documentation~/FAQ.md b/com.unity.ml-agents/Documentation~/FAQ.md new file mode 100644 index 0000000000..ee9135b308 --- /dev/null +++ b/com.unity.ml-agents/Documentation~/FAQ.md @@ -0,0 +1,79 @@ +# Frequently Asked Questions + +## Installation problems + +## Environment Permission Error + +If you directly import your Unity environment without building it in the editor, +you might need to give it additional permissions to execute it. + +If you receive such a permission error on macOS, run: + +```sh +chmod -R 755 *.app +``` + +or on Linux: + +```sh +chmod -R 755 *.x86_64 +``` + +On Windows, you can find +[instructions](https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2008-R2-and-2008/cc754344(v=ws.11)). + +## Environment Connection Timeout + +If you are able to launch the environment from `UnityEnvironment` but then +receive a timeout error like this: + +``` +UnityAgentsException: The Communicator was unable to connect. Please make sure the External process is ready to accept communication with Unity. +``` + +There may be a number of possible causes: + +- _Cause_: There may be no agent in the scene +- _Cause_: On OSX, the firewall may be preventing communication with the + environment. _Solution_: Add the built environment binary to the list of + exceptions on the firewall by following + [instructions](https://support.apple.com/en-us/HT201642). +- _Cause_: An error happened in the Unity Environment preventing communication. + _Solution_: Look into the + [log files](https://docs.unity3d.com/Manual/LogFiles.html) generated by the + Unity Environment to figure what error happened. +- _Cause_: You have assigned `HTTP_PROXY` and `HTTPS_PROXY` values in your + environment variables. _Solution_: Remove these values and try again. +- _Cause_: You are running in a headless environment (e.g. remotely connected + to a server). _Solution_: Pass `--no-graphics` to `mlagents-learn`, or + `no_graphics=True` to `RemoteRegistryEntry.make()` or the `UnityEnvironment` + initializer. If you need graphics for visual observations, you will need to + set up `xvfb` (or equivalent). + +## Communication port {} still in use + +If you receive an exception +`"Couldn't launch new environment because communication port {} is still in use. "`, +you can change the worker number in the Python script when calling + +```python +UnityEnvironment(file_name=filename, worker_id=X) +``` + +## Mean reward : nan + +If you receive a message `Mean reward : nan` when attempting to train a model +using PPO, this is due to the episodes of the Learning Environment not +terminating. In order to address this, set `Max Steps` for the Agents within the +Scene Inspector to a value greater than 0. Alternatively, it is possible to +manually set `done` conditions for episodes from within scripts for custom +episode-terminating events. + +## "File name" cannot be opened because the developer cannot be verified. + +If you have downloaded the repository using the github website on macOS 10.15 (Catalina) +or later, you may see this error when attempting to play scenes in the Unity project. +Workarounds include installing the package using the Unity Package Manager (this is +the officially supported approach - see [here](Installation.md)), or following the +instructions [here](https://support.apple.com/en-us/HT202491) to verify the relevant +files on your machine on a file-by-file basis. diff --git a/com.unity.ml-agents/Documentation~/Getting-Started.md b/com.unity.ml-agents/Documentation~/Getting-Started.md new file mode 100644 index 0000000000..33ce2e862b --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Getting-Started.md @@ -0,0 +1,265 @@ +# Getting Started Guide + +This guide walks through the end-to-end process of opening one of our +[example environments](Learning-Environment-Examples.md) in Unity, training an +Agent in it, and embedding the trained model into the Unity environment. After +reading this tutorial, you should be able to train any of the example +environments. If you are not familiar with the +[Unity Engine](https://unity3d.com/unity), view our +[Background: Unity](Background-Unity.md) page for helpful pointers. +Additionally, if you're not familiar with machine learning, view our +[Background: Machine Learning](Background-Machine-Learning.md) page for a brief +overview and helpful pointers. + +![3D Balance Ball](images/balance.png) + +For this guide, we'll use the **3D Balance Ball** environment which contains a +number of agent cubes and balls (which are all copies of each other). Each agent +cube tries to keep its ball from falling by rotating either horizontally or +vertically. In this environment, an agent cube is an **Agent** that receives a +reward for every step that it balances the ball. An agent is also penalized with +a negative reward for dropping the ball. The goal of the training process is to +have the agents learn to balance the ball on their head. + +Let's get started! + +## Installation + +If you haven't already, follow the [installation instructions](Installation.md). +Afterwards, open the Unity Project that contains all the example environments: + +1. Open the Package Manager Window by navigating to `Window -> Package Manager` + in the menu. +1. Navigate to the ML-Agents Package and click on it. +1. Find the `3D Ball` sample and click `Import`. +1. In the **Project** window, go to the + `Assets/ML-Agents/Examples/3DBall/Scenes` folder and open the `3DBall` scene + file. + +## Understanding a Unity Environment + +An agent is an autonomous actor that observes and interacts with an +_environment_. In the context of Unity, an environment is a scene containing one +or more Agent objects, and, of course, the other entities that an agent +interacts with. + +![Unity Editor](images/mlagents-3DBallHierarchy.png) + +**Note:** In Unity, the base object of everything in a scene is the +_GameObject_. The GameObject is essentially a container for everything else, +including behaviors, graphics, physics, etc. To see the components that make up +a GameObject, select the GameObject in the Scene window, and open the Inspector +window. The Inspector shows every component on a GameObject. + +The first thing you may notice after opening the 3D Balance Ball scene is that +it contains not one, but several agent cubes. Each agent cube in the scene is an +independent agent, but they all share the same Behavior. 3D Balance Ball does +this to speed up training since all twelve agents contribute to training in +parallel. + +### Agent + +The Agent is the actor that observes and takes actions in the environment. In +the 3D Balance Ball environment, the Agent components are placed on the twelve +"Agent" GameObjects. The base Agent object has a few properties that affect its +behavior: + +- **Behavior Parameters** — Every Agent must have a Behavior. The Behavior + determines how an Agent makes decisions. +- **Max Step** — Defines how many simulation steps can occur before the Agent's + episode ends. In 3D Balance Ball, an Agent restarts after 5000 steps. + +#### Behavior Parameters : Vector Observation Space + +Before making a decision, an agent collects its observation about its state in +the world. The vector observation is a vector of floating point numbers which +contain relevant information for the agent to make decisions. + +The Behavior Parameters of the 3D Balance Ball example uses a `Space Size` of 8. +This means that the feature vector containing the Agent's observations contains +eight elements: the `x` and `z` components of the agent cube's rotation and the +`x`, `y`, and `z` components of the ball's relative position and velocity. + +#### Behavior Parameters : Actions + +An Agent is given instructions in the form of actions. +ML-Agents Toolkit classifies actions into two types: continuous and discrete. +The 3D Balance Ball example is programmed to use continuous actions, which +are a vector of floating-point numbers that can vary continuously. More specifically, +it uses a `Space Size` of 2 to control the amount of `x` and `z` rotations to apply to +itself to keep the ball balanced on its head. + +## Running a pre-trained model + +We include pre-trained models for our agents (`.onnx` files) and we use the +[Inference Engine](Inference-Engine.md) to run these models inside +Unity. In this section, we will use the pre-trained model for the 3D Ball +example. + +1. In the **Project** window, go to the + `Assets/ML-Agents/Examples/3DBall/Prefabs` folder. Expand `3DBall` and click + on the `Agent` prefab. You should see the `Agent` prefab in the **Inspector** + window. + + **Note**: The platforms in the `3DBall` scene were created using the `3DBall` + prefab. Instead of updating all 12 platforms individually, you can update the + `3DBall` prefab instead. + + ![Platform Prefab](images/platform_prefab.png) + +1. In the **Project** window, drag the **3DBall** Model located in + `Assets/ML-Agents/Examples/3DBall/TFModels` into the `Model` property under + `Behavior Parameters (Script)` component in the Agent GameObject + **Inspector** window. + + ![3dball learning brain](images/3dball_learning_brain.png) + +1. You should notice that each `Agent` under each `3DBall` in the **Hierarchy** + windows now contains **3DBall** as `Model` on the `Behavior Parameters`. + **Note** : You can modify multiple game objects in a scene by selecting them + all at once using the search bar in the Scene Hierarchy. +1. Set the **Inference Device** to use for this model as `CPU`. +1. Click the **Play** button in the Unity Editor and you will see the platforms + balance the balls using the pre-trained model. + +## Training a new model with Reinforcement Learning + +While we provide pre-trained models for the agents in this environment, any +environment you make yourself will require training agents from scratch to +generate a new model file. In this section we will demonstrate how to use the +reinforcement learning algorithms that are part of the ML-Agents Python package +to accomplish this. We have provided a convenient command `mlagents-learn` which +accepts arguments used to configure both training and inference phases. + +### Training the environment + +1. Open a command or terminal window. +1. Navigate to the folder where you cloned the `ml-agents` repository. **Note**: + If you followed the default [installation](Installation.md), then you should + be able to run `mlagents-learn` from any directory. +1. Run `mlagents-learn config/ppo/3DBall.yaml --run-id=first3DBallRun`. + - `config/ppo/3DBall.yaml` is the path to a default training + configuration file that we provide. The `config/ppo` folder includes training configuration + files for all our example environments, including 3DBall. + - `run-id` is a unique name for this training session. +1. When the message _"Start training by pressing the Play button in the Unity + Editor"_ is displayed on the screen, you can press the **Play** button in + Unity to start training in the Editor. + +If `mlagents-learn` runs correctly and starts training, you should see something +like this: + +```console +INFO:mlagents_envs: +'Ball3DAcademy' started successfully! +Unity Academy name: Ball3DAcademy + +INFO:mlagents_envs:Connected new brain: +Unity brain name: 3DBallLearning + Number of Visual Observations (per agent): 0 + Vector Observation space size (per agent): 8 + Number of stacked Vector Observation: 1 +INFO:mlagents_envs:Hyperparameters for the PPO Trainer of brain 3DBallLearning: + batch_size: 64 + beta: 0.001 + buffer_size: 12000 + epsilon: 0.2 + gamma: 0.995 + hidden_units: 128 + lambd: 0.99 + learning_rate: 0.0003 + max_steps: 5.0e4 + normalize: True + num_epoch: 3 + num_layers: 2 + time_horizon: 1000 + sequence_length: 64 + summary_freq: 1000 + use_recurrent: False + memory_size: 256 + use_curiosity: False + curiosity_strength: 0.01 + curiosity_enc_size: 128 + output_path: ./results/first3DBallRun/3DBallLearning +INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 1000. Mean Reward: 1.242. Std of Reward: 0.746. Training. +INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 2000. Mean Reward: 1.319. Std of Reward: 0.693. Training. +INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 3000. Mean Reward: 1.804. Std of Reward: 1.056. Training. +INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 4000. Mean Reward: 2.151. Std of Reward: 1.432. Training. +INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 5000. Mean Reward: 3.175. Std of Reward: 2.250. Training. +INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 6000. Mean Reward: 4.898. Std of Reward: 4.019. Training. +INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 7000. Mean Reward: 6.716. Std of Reward: 5.125. Training. +INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 8000. Mean Reward: 12.124. Std of Reward: 11.929. Training. +INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 9000. Mean Reward: 18.151. Std of Reward: 16.871. Training. +INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 10000. Mean Reward: 27.284. Std of Reward: 28.667. Training. +``` + +Note how the `Mean Reward` value printed to the screen increases as training +progresses. This is a positive sign that training is succeeding. + +**Note**: You can train using an executable rather than the Editor. To do so, +follow the instructions in +[Using an Executable](Learning-Environment-Executable.md). + +### Observing Training Progress + +Once you start training using `mlagents-learn` in the way described in the +previous section, the `ml-agents` directory will contain a `results` +directory. In order to observe the training process in more detail, you can use +TensorBoard. From the command line run: + +```sh +tensorboard --logdir results +``` + +Then navigate to `localhost:6006` in your browser to view the TensorBoard +summary statistics as shown below. For the purposes of this section, the most +important statistic is `Environment/Cumulative Reward` which should increase +throughout training, eventually converging close to `100` which is the maximum +reward the agent can accumulate. + +![Example TensorBoard Run](images/mlagents-TensorBoard.png) + +## Embedding the model into the Unity Environment + +Once the training process completes, and the training process saves the model +(denoted by the `Saved Model` message) you can add it to the Unity project and +use it with compatible Agents (the Agents that generated the model). **Note:** +Do not just close the Unity Window once the `Saved Model` message appears. +Either wait for the training process to close the window or press `Ctrl+C` at +the command-line prompt. If you close the window manually, the `.onnx` file +containing the trained model is not exported into the ml-agents folder. + +If you've quit the training early using `Ctrl+C` and want to resume training, +run the same command again, appending the `--resume` flag: + +```sh +mlagents-learn config/ppo/3DBall.yaml --run-id=first3DBallRun --resume +``` + +Your trained model will be at `results//.onnx` where +`` is the name of the `Behavior Name` of the agents corresponding +to the model. This file corresponds to your model's latest checkpoint. You can +now embed this trained model into your Agents by following the steps below, +which is similar to the steps described [above](#running-a-pre-trained-model). + +1. Move your model file into + `Project/Assets/ML-Agents/Examples/3DBall/TFModels/`. +1. Open the Unity Editor, and select the **3DBall** scene as described above. +1. Select the **3DBall** prefab Agent object. +1. Drag the `.onnx` file from the Project window of the Editor to + the **Model** placeholder in the **Ball3DAgent** inspector window. +1. Press the **Play** button at the top of the Editor. + +## Next Steps + +- For more information on the ML-Agents Toolkit, in addition to helpful + background, check out the [ML-Agents Toolkit Overview](ML-Agents-Overview.md) + page. +- For a "Hello World" introduction to creating your own Learning Environment, + check out the + [Making a New Learning Environment](Learning-Environment-Create-New.md) page. +- For an overview on the more complex example environments that are provided in + this toolkit, check out the + [Example Environments](Learning-Environment-Examples.md) page. +- For more information on the various training options available, check out the + [Training ML-Agents](Training-ML-Agents.md) page. diff --git a/com.unity.ml-agents/Documentation~/Glossary.md b/com.unity.ml-agents/Documentation~/Glossary.md new file mode 100644 index 0000000000..7d912fe296 --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Glossary.md @@ -0,0 +1,36 @@ +# ML-Agents Toolkit Glossary + +- **Academy** - Singleton object which controls timing, reset, and + training/inference settings of the environment. +- **Action** - The carrying-out of a decision on the part of an agent within the + environment. +- **Agent** - Unity Component which produces observations and takes actions in + the environment. Agents actions are determined by decisions produced by a + Policy. +- **Decision** - The specification produced by a Policy for an action to be + carried out given an observation. +- **Editor** - The Unity Editor, which may include any pane (e.g. Hierarchy, + Scene, Inspector). +- **Environment** - The Unity scene which contains Agents. +- **Experience** - Corresponds to a tuple of [Agent observations, actions, + rewards] of a single Agent obtained after a Step. +- **External Coordinator** - ML-Agents class responsible for communication with + outside processes (in this case, the Python API). +- **FixedUpdate** - Unity method called each time the game engine is stepped. + ML-Agents logic should be placed here. +- **Frame** - An instance of rendering the main camera for the display. + Corresponds to each `Update` call of the game engine. +- **Observation** - Partial information describing the state of the environment + available to a given agent. (e.g. Vector, Visual) +- **Policy** - The decision making mechanism for producing decisions from + observations, typically a neural network model. +- **Reward** - Signal provided at every step used to indicate desirability of an + agent’s action within the current state of the environment. +- **State** - The underlying properties of the environment (including all agents + within it) at a given time. +- **Step** - Corresponds to an atomic change of the engine that happens between + Agent decisions. +- **Trainer** - Python class which is responsible for training a given group of + Agents. +- **Update** - Unity function called each time a frame is rendered. ML-Agents + logic should not be placed here. diff --git a/com.unity.ml-agents/Documentation~/Hugging-Face-Integration.md b/com.unity.ml-agents/Documentation~/Hugging-Face-Integration.md new file mode 100644 index 0000000000..189624817f --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Hugging-Face-Integration.md @@ -0,0 +1,56 @@ +# The Hugging Face Integration + +The [Hugging Face Hub 🤗](https://huggingface.co/models?pipeline_tag=reinforcement-learning) is a central place **where anyone can share and download models**. + +It allows you to: +- **Host** your trained models. +- **Download** trained models from the community. +- Visualize your agents **playing directly on your browser**. + +You can see the list of ml-agents models [here](https://huggingface.co/models?library=ml-agents). + +We wrote a **complete tutorial to learn to train your first agent using ML-Agents and publish it to the Hub**: + +- A short tutorial where you [teach **Huggy the Dog to fetch the stick** and then play with him directly in your browser](https://huggingface.co/learn/deep-rl-course/unitbonus1/introduction) +- A [more in-depth tutorial](https://huggingface.co/learn/deep-rl-course/unit5/introduction) + +## Download a model from the Hub + +You can simply download a model from the Hub using `mlagents-load-from-hf`. + +You need to define two parameters: + +- `--repo-id`: the name of the Hugging Face repo you want to download. +- `--local-dir`: the path to download the model. + +For instance, I want to load the model with model-id "ThomasSimonini/MLAgents-Pyramids" and put it in the downloads directory: + +```sh +mlagents-load-from-hf --repo-id="ThomasSimonini/MLAgents-Pyramids" --local-dir="./downloads" +``` + +## Upload a model to the Hub + +You can simply upload a model to the Hub using `mlagents-push-to-hf` + +You need to define four parameters: + +- `--run-id`: the name of the training run id. +- `--local-dir`: where the model was saved +- `--repo-id`: the name of the Hugging Face repo you want to create or update. It’s always / If the repo does not exist it will be created automatically +- `--commit-message`: since HF repos are git repositories you need to give a commit message. + +For instance, I want to upload my model trained with run-id "SnowballTarget1" to the repo-id: ThomasSimonini/ppo-SnowballTarget: + +```sh + mlagents-push-to-hf --run-id="SnowballTarget1" --local-dir="./results/SnowballTarget1" --repo-id="ThomasSimonini/ppo-SnowballTarget" --commit-message="First Push" +``` + +## Visualize an agent playing + +You can watch your agent playing directly in your browser (if the environment is from the [ML-Agents official environments](Learning-Environment-Examples.md)) + +- Step 1: Go to https://huggingface.co/unity and select the environment demo. +- Step 2: Find your model_id in the list. +- Step 3: Select your .nn /.onnx file. +- Step 4: Click on Watch the agent play diff --git a/com.unity.ml-agents/Documentation~/Inference-Engine.md b/com.unity.ml-agents/Documentation~/Inference-Engine.md new file mode 100644 index 0000000000..0f7c3e3402 --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Inference-Engine.md @@ -0,0 +1,50 @@ +# Inference Engine + +The ML-Agents Toolkit allows you to use pre-trained neural network models inside +your Unity games. This support is possible thanks to the +[Inference Engine](https://docs.unity3d.com/Packages/com.unity.ai.inference@latest). +Inference Engine uses +[compute shaders](https://docs.unity3d.com/Manual/class-ComputeShader.html) to +run the neural network within Unity. + +## Supported devices + +Inference Engine supports [all Unity runtime platforms](https://docs.unity3d.com/Manual/PlatformSpecific.html). + +Scripting Backends : Inference Engine is generally faster with +**IL2CPP** than with **Mono** for Standalone builds. In the Editor, It is not +possible to use Inference Engine with GPU device selected when Editor +Graphics Emulation is set to **OpenGL(ES) 3.0 or 2.0 emulation**. Also there +might be non-fatal build time errors when target platform includes Graphics API +that does not support **Unity Compute Shaders**. + +In cases when it is not possible to use compute shaders on the target platform, +inference can be performed using **CPU** or **GPUPixel** Inference Engine backends. + +## Using Inference Engine + +When using a model, drag the model file into the **Model** field in the +Inspector of the Agent. Select the **Inference Device**: **Compute Shader**, **Burst** or +**Pixel Shader** you want to use for inference. + +**Note:** For most of the models generated with the ML-Agents Toolkit, CPU inference (**Burst**) will +be faster than GPU inference (**Compute Shader** or **Pixel Shader**). +You should use GPU inference only if you use the ResNet visual +encoder or have a large number of agents with visual observations. + +# Unsupported use cases +## Externally trained models +The ML-Agents Toolkit only supports the models created with our trainers. Model +loading expects certain conventions for constants and tensor names. While it is +possible to construct a model that follows these conventions, we don't provide +any additional help for this. More details can be found in +[TensorNames.cs](https://github.com/Unity-Technologies/ml-agents/blob/release_22_docs/com.unity.ml-agents/Runtime/Inference/TensorNames.cs) +and +[SentisModelParamLoader.cs](https://github.com/Unity-Technologies/ml-agents/blob/release_22_docs/com.unity.ml-agents/Runtime/Inference/SentisModelParamLoader.cs). + +If you wish to run inference on an externally trained model, you should use +Inference Engine directly, instead of trying to run it through ML-Agents. + +## Model inference outside of Unity +We do not provide support for inference anywhere outside of Unity. The `.onnx` files produced by training use the open format ONNX; if you wish to convert a `.onnx` file to another +format or run inference with them, refer to their documentation. diff --git a/com.unity.ml-agents/Documentation~/InputSystem-Integration.md b/com.unity.ml-agents/Documentation~/InputSystem-Integration.md new file mode 100644 index 0000000000..ba27818f02 --- /dev/null +++ b/com.unity.ml-agents/Documentation~/InputSystem-Integration.md @@ -0,0 +1,41 @@ +# Input System Integration + +The ML-Agents package integrates with the [Input System Package](https://docs.unity3d.com/Packages/com.unity.inputsystem@1.1/manual/QuickStartGuide.html) through the `InputActuatorComponent`. This component sets up an action space for your `Agent` based on an `InputActionAsset` that is referenced by the `IInputActionAssetProvider` interface, or the `PlayerInput` component that may be living on your player controlled `Agent`. This means that if you have code outside of your agent that handles input, you will not need to implement the Heuristic function in agent as well. The `InputActuatorComponent` will handle this for you. You can now train and run inference on `Agents` with an action space defined by an `InputActionAsset`. + +Take a look at how we have implemented the C# code in the example Input Integration scene (located under Project/Assets/ML-Agents/Examples/PushBlockWithInput/). Once you have some familiarity, then the next step would be to add the InputActuatorComponent to your player Agent. The example we have implemented uses C# Events to send information from the Input System. + +## Getting Started with Input System Integration +1. Add the `com.unity.inputsystem` version 1.1.0-preview.3 or later to your project via the Package Manager window. +2. If you have already setup an InputActionAsset skip to Step 3, otherwise follow these sub steps: +1. Create an InputActionAsset to allow your Agent to be controlled by the Input System. +2. Handle the events from the Input System where you normally would (i.e. a script external to your Agent class). +3. Add the InputSystemActuatorComponent to the GameObject that has the `PlayerInput` and `Agent` components attached. + +Additionally, see below for additional technical specifications on the C# code for the InputActuatorComponent. +## Technical Specifications + +### `IInputActionsAssetProvider` Interface +The `InputActuatorComponent` searches for a `Component` that implements +`IInputActionAssetProvider` on the `GameObject` they both are attached to. It is important to note +that if multiple `Components` on your `GameObject` need to access an `InputActionAsset` to handle events, +they will need to share the same instance of the `InputActionAsset` that is returned from the +`IInputActionAssetProvider`. + +### `InputActuatorComponent` Class +The `InputActuatorComponent` is the bridge between ML-Agents and the Input System. It allows ML-Agents to: +* create an `ActionSpec` for your Agent based on an `InputActionAsset` that comes from an + `IInputActionAssetProvider`. +* send simulated input from a training process or a neural network +* let developers keep their input handling code in one place + +This is accomplished by adding the `InputActuatorComponent` to an Agent which already has the PlayerInput component attached. + +## Requirements + +If using the `InputActuatorComponent`, install the `com.unity.inputsystem` package `version 1.1.0-preview.3` or later. + +## Known Limitations + +For the `InputActuatorComponent` +- Limited implementation of `InputControls` +- No way to customize the action space of the `InputActuatorComponent` diff --git a/com.unity.ml-agents/Documentation~/Installation-Anaconda-Windows.md b/com.unity.ml-agents/Documentation~/Installation-Anaconda-Windows.md new file mode 100644 index 0000000000..3b80adbdf0 --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Installation-Anaconda-Windows.md @@ -0,0 +1,362 @@ +# Installing ML-Agents Toolkit for Windows (Deprecated) + +:warning: **Note:** We no longer use this guide ourselves and so it may not work +correctly. We've decided to keep it up just in case it is helpful to you. + +The ML-Agents Toolkit supports Windows 10. While it might be possible to run the +ML-Agents Toolkit using other versions of Windows, it has not been tested on +other versions. Furthermore, the ML-Agents Toolkit has not been tested on a +Windows VM such as Bootcamp or Parallels. + +To use the ML-Agents Toolkit, you install Python and the required Python +packages as outlined below. This guide also covers how set up GPU-based training +(for advanced users). GPU-based training is not currently required for the +ML-Agents Toolkit. However, training on a GPU might be required by future +versions and features. + +## Step 1: Install Python via Anaconda + +[Download](https://www.anaconda.com/download/#windows) and install Anaconda for +Windows. By using Anaconda, you can manage separate environments for different +distributions of Python. Python 3.7.2 or higher is required as we no longer +support Python 2. In this guide, we are using Python version 3.7 and Anaconda +version 5.1 +([64-bit](https://repo.continuum.io/archive/Anaconda3-5.1.0-Windows-x86_64.exe) +or [32-bit](https://repo.continuum.io/archive/Anaconda3-5.1.0-Windows-x86.exe) +direct links). + +

+ Anaconda Install +

+ +We recommend the default _advanced installation options_. However, select the +options appropriate for your specific situation. + +

+ Anaconda Install +

+ +After installation, you must open **Anaconda Navigator** to finish the setup. +From the Windows search bar, type _anaconda navigator_. You can close Anaconda +Navigator after it opens. + +If environment variables were not created, you will see error "conda is not +recognized as internal or external command" when you type `conda` into the +command line. To solve this you will need to set the environment variable +correctly. + +Type `environment variables` in the search bar (this can be reached by hitting +the Windows key or the bottom left Windows button). You should see an option +called **Edit the system environment variables**. + +

+ edit env variables +

+ +From here, click the **Environment Variables** button. Double click "Path" under +**System variable** to edit the "Path" variable, click **New** to add the +following new paths. + +```console +%UserProfile%\Anaconda3\Scripts +%UserProfile%\Anaconda3\Scripts\conda.exe +%UserProfile%\Anaconda3 +%UserProfile%\Anaconda3\python.exe +``` + +## Step 2: Setup and Activate a New Conda Environment + +You will create a new [Conda environment](https://conda.io/docs/) to be used +with the ML-Agents Toolkit. This means that all the packages that you install +are localized to just this environment. It will not affect any other +installation of Python or other environments. Whenever you want to run +ML-Agents, you will need activate this Conda environment. + +To create a new Conda environment, open a new Anaconda Prompt (_Anaconda Prompt_ +in the search bar) and type in the following command: + +```sh +conda create -n ml-agents python=3.7 +``` + +You may be asked to install new packages. Type `y` and press enter _(make sure +you are connected to the Internet)_. You must install these required packages. +The new Conda environment is called ml-agents and uses Python version 3.7. + +

+ Anaconda Install +

+ +To use this environment, you must activate it. _(To use this environment In the +future, you can run the same command)_. In the same Anaconda Prompt, type in the +following command: + +```sh +activate ml-agents +``` + +You should see `(ml-agents)` prepended on the last line. + +Next, install `tensorflow`. Install this package using `pip` - which is a +package management system used to install Python packages. Latest versions of +TensorFlow won't work, so you will need to make sure that you install version +1.7.1. In the same Anaconda Prompt, type in the following command _(make sure +you are connected to the Internet)_: + +```sh +pip install tensorflow==1.7.1 +``` + +## Step 3: Install Required Python Packages + +The ML-Agents Toolkit depends on a number of Python packages. Use `pip` to +install these Python dependencies. + +If you haven't already, clone the ML-Agents Toolkit Github repository to your +local computer. You can do this using Git +([download here](https://git-scm.com/download/win)) and running the following +commands in an Anaconda Prompt _(if you open a new prompt, be sure to activate +the ml-agents Conda environment by typing `activate ml-agents`)_: + +```sh +git clone --branch release_22 https://github.com/Unity-Technologies/ml-agents.git +``` + +The `--branch release_22` option will switch to the tag of the latest stable +release. Omitting that will get the `main` branch which is potentially +unstable. + +If you don't want to use Git, you can find download links on the +[releases page](https://github.com/Unity-Technologies/ml-agents/releases). + +The `com.unity.ml-agents` subdirectory contains the core code to add to your +projects. The `Project` subdirectory contains many +[example environments](Learning-Environment-Examples.md) to help you get +started. + +The `ml-agents` subdirectory contains a Python package which provides deep +reinforcement learning trainers to use with Unity environments. + +The `ml-agents-envs` subdirectory contains a Python API to interface with Unity, +which the `ml-agents` package depends on. + +The `gym-unity` subdirectory contains a package to interface with OpenAI Gym. + +Keep in mind where the files were downloaded, as you will need the trainer +config files in this directory when running `mlagents-learn`. Make sure you are +connected to the Internet and then type in the Anaconda Prompt: + +```console +python -m pip install mlagents==1.1.0 +``` + +This will complete the installation of all the required Python packages to run +the ML-Agents Toolkit. + +Sometimes on Windows, when you use pip to install certain Python packages, the +pip will get stuck when trying to read the cache of the package. If you see +this, you can try: + +```console +python -m pip install mlagents==1.1.0 --no-cache-dir +``` + +This `--no-cache-dir` tells the pip to disable the cache. + +### Installing for Development + +If you intend to make modifications to `ml-agents` or `ml-agents-envs`, you +should install the packages from the cloned repo rather than from PyPi. To do +this, you will need to install `ml-agents` and `ml-agents-envs` separately. + +In our example, the files are located in `C:\Downloads`. After you have either +cloned or downloaded the files, from the Anaconda Prompt, change to the +ml-agents subdirectory inside the ml-agents directory: + +```console +cd C:\Downloads\ml-agents +``` + +From the repo's main directory, now run: + +```console +cd ml-agents-envs +pip install -e . +cd .. +cd ml-agents +pip install -e . +``` + +Running pip with the `-e` flag will let you make changes to the Python files +directly and have those reflected when you run `mlagents-learn`. It is important +to install these packages in this order as the `mlagents` package depends on +`mlagents_envs`, and installing it in the other order will download +`mlagents_envs` from PyPi. + +## (Optional) Step 4: GPU Training using The ML-Agents Toolkit + +GPU is not required for the ML-Agents Toolkit and won't speed up the PPO +algorithm a lot during training(but something in the future will benefit from +GPU). This is a guide for advanced users who want to train using GPUs. +Additionally, you will need to check if your GPU is CUDA compatible. Please +check Nvidia's page [here](https://developer.nvidia.com/cuda-gpus). + +Currently for the ML-Agents Toolkit, only CUDA v9.0 and cuDNN v7.0.5 is +supported. + +### Install Nvidia CUDA toolkit + +[Download](https://developer.nvidia.com/cuda-toolkit-archive) and install the +CUDA toolkit 9.0 from Nvidia's archive. The toolkit includes GPU-accelerated +libraries, debugging and optimization tools, a C/C++ (Step Visual Studio 2017) +compiler and a runtime library and is needed to run the ML-Agents Toolkit. In +this guide, we are using version +[9.0.176](https://developer.nvidia.com/compute/cuda/9.0/Prod/network_installers/cuda_9.0.176_win10_network-exe)). + +Before installing, please make sure you **close any running instances of Unity +or Visual Studio**. + +Run the installer and select the Express option. Note the directory where you +installed the CUDA toolkit. In this guide, we installed in the directory +`C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0` + +### Install Nvidia cuDNN library + +[Download](https://developer.nvidia.com/cudnn) and install the cuDNN library +from Nvidia. cuDNN is a GPU-accelerated library of primitives for deep neural +networks. Before you can download, you will need to sign up for free to the +Nvidia Developer Program. + +

+ cuDNN membership required +

+ +Once you've signed up, go back to the cuDNN +[downloads page](https://developer.nvidia.com/cudnn). You may or may not be +asked to fill out a short survey. When you get to the list cuDNN releases, +**make sure you are downloading the right version for the CUDA toolkit you +installed in Step 1.** In this guide, we are using version 7.0.5 for CUDA +toolkit version 9.0 +([direct link](https://developer.nvidia.com/compute/machine-learning/cudnn/secure/v7.0.5/prod/9.0_20171129/cudnn-9.0-windows10-x64-v7)). + +After you have downloaded the cuDNN files, you will need to extract the files +into the CUDA toolkit directory. In the cuDNN zip file, there are three folders +called `bin`, `include`, and `lib`. + +

+ cuDNN zip files +

+ +Copy these three folders into the CUDA toolkit directory. The CUDA toolkit +directory is located at +`C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0` + +

+ cuda toolkit directory +

+ +### Set Environment Variables + +You will need to add one environment variable and two path variables. + +To set the environment variable, type `environment variables` in the search bar +(this can be reached by hitting the Windows key or the bottom left Windows +button). You should see an option called **Edit the system environment +variables**. + +

+ edit env variables +

+ +From here, click the **Environment Variables** button. Click **New** to add a +new system variable _(make sure you do this under **System variables** and not +User variables_. + +

+ new system variable +

+ +For **Variable Name**, enter `CUDA_HOME`. For the variable value, put the +directory location for the CUDA toolkit. In this guide, the directory location +is `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0`. Press **OK** once. + +

+ system variable names and values +

+ +To set the two path variables, inside the same **Environment Variables** window +and under the second box called **System Variables**, find a variable called +`Path` and click **Edit**. You will add two directories to the list. For this +guide, the two entries would look like: + +```console +C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\lib\x64 +C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\extras\CUPTI\libx64 +``` + +Make sure to replace the relevant directory location with the one you have +installed. _Please note that case sensitivity matters_. + +

+ Path variables +

+ +### Install TensorFlow GPU + +Next, install `tensorflow-gpu` using `pip`. You'll need version 1.7.1. In an +Anaconda Prompt with the Conda environment ml-agents activated, type in the +following command to uninstall TensorFlow for cpu and install TensorFlow for gpu +_(make sure you are connected to the Internet)_: + +```sh +pip uninstall tensorflow +pip install tensorflow-gpu==1.7.1 +``` + +Lastly, you should test to see if everything installed properly and that +TensorFlow can identify your GPU. In the same Anaconda Prompt, open Python in +the Prompt by calling: + +```sh +python +``` + +And then type the following commands: + +```python +import tensorflow as tf + +sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) +``` + +You should see something similar to: + +```console +Found device 0 with properties ... +``` + +## Acknowledgments + +We would like to thank +[Jason Weimann](https://unity3d.college/2017/10/25/machine-learning-in-unity3d-setting-up-the-environment-tensorflow-for-agentml-on-windows-10/) +and +[Nitish S. Mutha](http://blog.nitishmutha.com/tensorflow/2017/01/22/TensorFlow-with-gpu-for-windows.html) +for writing the original articles which were used to create this guide. diff --git a/com.unity.ml-agents/Documentation~/Installation.md b/com.unity.ml-agents/Documentation~/Installation.md new file mode 100644 index 0000000000..feb48c4d88 --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Installation.md @@ -0,0 +1,216 @@ +# Installation + +The ML-Agents Toolkit contains several components: + +- Unity package ([`com.unity.ml-agents`](../com.unity.ml-agents/)) contains the + Unity C# SDK that will be integrated into your Unity project. This package contains + a sample to help you get started with ML-Agents, including advanced features like + custom sensors, input system integration, and physics-based components. +- Two Python packages: + - [`mlagents`](../ml-agents/) contains the machine learning algorithms that + enables you to train behaviors in your Unity scene. Most users of ML-Agents + will only need to directly install `mlagents`. + - [`mlagents_envs`](../ml-agents-envs/) contains a set of Python APIs to interact with + a Unity scene. It is a foundational layer that facilitates data messaging + between Unity scene and the Python machine learning algorithms. + Consequently, `mlagents` depends on `mlagents_envs`. +- Unity [Project](https://github.com/Unity-Technologies/ml-agents/tree/main/Project/) that contains several + [example environments](Learning-Environment-Examples.md) that highlight the + various features of the toolkit to help you get started. + +Consequently, to install and use the ML-Agents Toolkit you will need to: + +- Install Unity (6000.0 or later) +- Install Python (>= 3.10.1, <=3.10.12) - we recommend using 3.10.12 +- Clone this repository (Recommended for the latest version and bug fixes) + - __Note:__ If you do not clone the repository, then you will not be + able to access the example environments and training configurations. + Additionally, the [Getting Started Guide](Getting-Started.md) assumes that you have cloned the + repository. +- Install the `com.unity.ml-agents` Unity package +- Install the `mlagents-envs` +- Install the `mlagents` Python package + +### Install **Unity 6000.0** or Later + +[Download](https://unity3d.com/get-unity/download) and install Unity. We +strongly recommend that you install Unity through the Unity Hub as it will +enable you to manage multiple Unity versions. + +### Install **Python 3.10.12** + +We recommend [installing](https://www.python.org/downloads/) Python 3.10.12. +If you are using Windows, please install the x86-64 version and not x86. +If your Python environment doesn't include `pip3`, see these +[instructions](https://packaging.python.org/guides/installing-using-linux-tools/#installing-pip-setuptools-wheel-with-linux-package-managers) +on installing it. We also recommend using [conda](https://docs.conda.io/en/latest/) or [mamba](https://github.com/mamba-org/mamba) to manage your python virtual environments. + +#### Conda python setup + +Once conda has been installed in your system, open a terminal and execute the following commands to setup a python 3.10.12 virtual environment +and activate it. + +```shell +conda create -n mlagents python=3.10.12 && conda activate mlagents +``` + +### Clone the ML-Agents Toolkit Repository (Recommended) + +Now that you have installed Unity and Python, you can now install the Unity and +Python packages. You do not need to clone the repository to install those +packages, but you may choose to clone the repository if you'd like download our +example environments and training configurations to experiment with them (some +of our tutorials / guides assume you have access to our example environments). + +**NOTE:** There are samples shipped with the Unity Package. You only need to clone +the repository if you would like to explore more examples. + +```sh +git clone --branch release_22 https://github.com/Unity-Technologies/ml-agents.git +``` + +The `--branch release_22` option will switch to the tag of the latest stable +release. Omitting that will get the `develop` branch which is potentially unstable. +However, if you find that a release branch does not work, the recommendation is to use +the `develop` branch as it may have potential fixes for bugs and dependency issues. + +(Optional to get bleeding edge) + +```sh +git clone https://github.com/Unity-Technologies/ml-agents.git +``` + +#### Advanced: Local Installation for Development + +You will need to clone the repository if you plan to modify or extend the +ML-Agents Toolkit for your purposes. If you plan to contribute those changes +back, make sure to clone the `develop` branch (by omitting `--branch release_22` +from the command above). See our +[Contributions Guidelines](../com.unity.ml-agents/CONTRIBUTING.md) for more +information on contributing to the ML-Agents Toolkit. + +### Install the `com.unity.ml-agents` Unity package + +The Unity ML-Agents C# SDK is a Unity Package. You can install the +`com.unity.ml-agents` package +[directly from the Package Manager registry](https://docs.unity3d.com/Manual/upm-ui-install.html). +Please make sure you enable 'Preview Packages' in the 'Advanced' dropdown in +order to find the latest Preview release of the package. + +**NOTE:** If you do not see the ML-Agents package listed in the Package Manager +please follow the [advanced installation instructions](#advanced-local-installation-for-development) below. + +#### Advanced: Local Installation for Development + +You can [add the local](https://docs.unity3d.com/Manual/upm-ui-local.html) +`com.unity.ml-agents` package (from the repository that you just cloned) to your +project by: + +1. navigating to the menu `Window` -> `Package Manager`. +1. In the package manager window click on the `+` button on the top left of the packages list). +1. Select `Add package from disk...` +1. Navigate into the `com.unity.ml-agents` folder. +1. Select the `package.json` file. + +

+ Unity Package Manager Window + package.json +

+ +If you are going to follow the examples from our documentation, you can open the +`Project` folder in Unity and start tinkering immediately. + +### Install the `mlagents` Python package + +Installing the `mlagents` Python package involves installing other Python +packages that `mlagents` depends on. So you may run into installation issues if +your machine has older versions of any of those dependencies already installed. +Consequently, our supported path for installing `mlagents` is to leverage Python +Virtual Environments. Virtual Environments provide a mechanism for isolating the +dependencies for each project and are supported on Mac / Windows / Linux. We +offer a dedicated [guide on Virtual Environments](Using-Virtual-Environment.md). + +#### (Windows) Installing PyTorch + +On Windows, you'll have to install the PyTorch package separately prior to +installing ML-Agents in order to make sure the cuda-enabled version is used, +rather than the CPU-only version. Activate your virtual environment and run from +the command line: + +```sh +pip3 install torch~=2.2.1 --index-url https://download.pytorch.org/whl/cu121 +``` + +Note that on Windows, you may also need Microsoft's +[Visual C++ Redistributable](https://support.microsoft.com/en-us/help/2977003/the-latest-supported-visual-c-downloads) +if you don't have it already. See the [PyTorch installation guide](https://pytorch.org/get-started/locally/) +for more installation options and versions. + +#### Installing `mlagents` + +To install the `mlagents` Python package, activate your virtual environment and +run from the command line: + +```sh +cd /path/to/ml-agents +python -m pip install ./ml-agents-envs +python -m pip install ./ml-agents +``` + +Note that this will install `mlagents` from the cloned repository, _not_ from the PyPi +repository. If you installed this correctly, you should be able to run +`mlagents-learn --help`, after which you will see the command +line parameters you can use with `mlagents-learn`. + +**NOTE:** Since ML-Agents development has slowed, PyPi releases will be less frequent. However, you can install from PyPi by executing +the following command: + +```shell +python -m pip install mlagents==1.1.0 +``` + +which will install the latest version of ML-Agents and associated dependencies available on PyPi. Note, you need to have the matching version of +the Unity packages with the particular release of the python packages. You can find the release history [here](https://github.com/Unity-Technologies/ml-agents/releases) + +By installing the `mlagents` package, the dependencies listed in the +[setup.py file](../ml-agents/setup.py) are also installed. These include +[PyTorch](Background-PyTorch.md). + +#### Advanced: Local Installation for Development + +If you intend to make modifications to `mlagents` or `mlagents_envs`, you should +install the packages from the cloned repository rather than from PyPi. To do +this, you will need to install `mlagents` and `mlagents_envs` separately. From +the repository's root directory, run: + +```sh +pip3 install torch -f https://download.pytorch.org/whl/torch_stable.html +pip3 install -e ./ml-agents-envs +pip3 install -e ./ml-agents +``` + +Running pip with the `-e` flag will let you make changes to the Python files +directly and have those reflected when you run `mlagents-learn`. It is important +to install these packages in this order as the `mlagents` package depends on +`mlagents_envs`, and installing it in the other order will download +`mlagents_envs` from PyPi. + +## Next Steps + +The [Getting Started](Getting-Started.md) guide contains several short tutorials +on setting up the ML-Agents Toolkit within Unity, running a pre-trained model, +in addition to building and training environments. + +## Help + +If you run into any problems regarding ML-Agents, refer to our [FAQ](FAQ.md) and +our [Limitations](Limitations.md) pages. If you can't find anything please +[submit an issue](https://github.com/Unity-Technologies/ml-agents/issues) and +make sure to cite relevant information on OS, Python version, and exact error +message (whenever possible). diff --git a/com.unity.ml-agents/Documentation~/Integrations-Match3.md b/com.unity.ml-agents/Documentation~/Integrations-Match3.md new file mode 100644 index 0000000000..215ed55930 --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Integrations-Match3.md @@ -0,0 +1,104 @@ +# Match-3 with ML-Agents + + + +## Getting started +The C# code for Match-3 exists inside of the Unity package (`com.unity.ml-agents`). +The good first step would be to take a look at how we have implemented the C# code in the example Match-3 scene (located +under /Project/Assets/ML-Agents/Examples/match3). Once you have some familiarity, then the next step would be to +implement the C# code for Match-3 from the extensions package. + +Additionally, see below for additional technical specifications on the C# code for Match-3. Please note the Match-3 game +isn't human playable as implemented and can be only played via training. + +## Technical specifications for Match-3 with ML-Agents + +### AbstractBoard class +The `AbstractBoard` is the bridge between ML-Agents and your game. It allows ML-Agents to +* ask your game what the current and maximum sizes (rows, columns, and potential piece types) of the board are +* ask your game what the "color" of a cell is +* ask whether the cell is a "special" piece type or not +* ask your game whether a move is allowed +* request that your game make a move + +These are handled by implementing the abstract methods of `AbstractBoard`. + +##### `public abstract BoardSize GetMaxBoardSize()` +Returns the largest `BoardSize` that the game can use. This is used to determine the sizes of observations and sensors, +so don't make it larger than necessary. + +##### `public virtual BoardSize GetCurrentBoardSize()` +Returns the current size of the board. Each field on this BoardSize must be less than or equal to the corresponding +field returned by `GetMaxBoardSize()`. This method is optional; if your always use the same size board, you don't +need to override it. + +If the current board size is smaller than the maximum board size, `GetCellType()` and `GetSpecialType()` will not be +called for cells outside the current board size, and `IsValidMove` won't be called for moves that would go outside of +the current board size. + +##### `public abstract int GetCellType(int row, int col)` +Returns the "color" of piece at the given row and column. +This should be between 0 and BoardSize.NumCellTypes-1 (inclusive). +The actual order of the values doesn't matter. + +##### `public abstract int GetSpecialType(int row, int col)` +Returns the special type of the piece at the given row and column. +This should be between 0 and BoardSize.NumSpecialTypes (inclusive). +The actual order of the values doesn't matter. + +##### `public abstract bool IsMoveValid(Move m)` +Check whether the particular `Move` is valid for the game. +The actual results will depend on the rules of the game, but we provide the `SimpleIsMoveValid()` method +that handles basic match3 rules with no special or immovable pieces. + +##### `public abstract bool MakeMove(Move m)` +Instruct the game to make the given move. Returns true if the move was made. +Note that during training, a move that was marked as invalid may occasionally still be +requested. If this happens, it is safe to do nothing and request another move. + +### `Move` struct +The Move struct encapsulates a swap of two adjacent cells. You can get the number of potential moves +for a board of a given size with. `Move.NumPotentialMoves(maxBoardSize)`. There are two helper +functions to create a new `Move`: +* `public static Move FromMoveIndex(int moveIndex, BoardSize maxBoardSize)` can be used to +iterate over all potential moves for the board by looping from 0 to `Move.NumPotentialMoves()` +* `public static Move FromPositionAndDirection(int row, int col, Direction dir, BoardSize maxBoardSize)` creates +a `Move` from a row, column, and direction (and board size). + +### `BoardSize` struct +Describes the "size" of the board, including the number of potential piece types that the board can have. +This is returned by the AbstractBoard.GetMaxBoardSize() and GetCurrentBoardSize() methods. + +#### `Match3Sensor` and `Match3SensorComponent` classes +The `Match3Sensor` generates observations about the state using the `AbstractBoard` interface. You can +choose whether to use vector or "visual" observations; in theory, visual observations should perform +better because they are 2-dimensional like the board, but we need to experiment more on this. + +A `Match3SensorComponent` generates `Match3Sensor`s (the exact number of sensors depends on your configuration) +at runtime, and should be added to the same GameObject as your `Agent` implementation. You do not need to write any +additional code to use them. + +#### `Match3Actuator` and `Match3ActuatorComponent` classes +The `Match3Actuator` converts actions from training or inference into a `Move` that is sent to` AbstractBoard.MakeMove()` +It also checks `AbstractBoard.IsMoveValid` for each potential move and uses this to set the action mask for Agent. + +A `Match3ActuatorComponent` generates a `Match3Actuator` at runtime, and should be added to the same GameObject +as your `Agent` implementation. You do not need to write any additional code to use them. + +### Setting up Match-3 simulation +* Implement the `AbstractBoard` methods to integrate with your game. +* Give the `Agent` rewards when it does what you want it to (match multiple pieces in a row, clears pieces of a certain +type, etc). +* Add the `Agent`, `AbstractBoard` implementation, `Match3SensorComponent`, and `Match3ActuatorComponent` to the same +`GameObject`. +* Call `Agent.RequestDecision()` when you're ready for the `Agent` to make a move on the next `Academy` step. During +the next `Academy` step, the `MakeMove()` method on the board will be called. + +## Implementation Details + +### Action Space +The indexing for actions is the same as described in +[Human Like Playtesting with Deep Learning](https://www.researchgate.net/publication/328307928_Human-Like_Playtesting_with_Deep_Learning) +(for example, Figure 2b). The horizontal moves are enumerated first, then the vertical ones. + + diff --git a/com.unity.ml-agents/Documentation~/Integrations.md b/com.unity.ml-agents/Documentation~/Integrations.md new file mode 100644 index 0000000000..4a007c1866 --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Integrations.md @@ -0,0 +1,10 @@ +# Game Integrations +ML-Agents provides some utilities to make it easier to integrate with some common genres of games. + +## Match-3 +The [Match-3 integration](Integrations-Match3.md) provides an abstraction of a match-3 game board and moves, along with +a sensor to observe the game state, and an actuator to translate the ML-Agent actions into game moves. + +## Interested in more game templates? +Do you have a type of game you are interested for ML-Agents? If so, please post a +[forum issue](https://forum.unity.com/forums/ml-agents.453/) with `[GAME TEMPLATE]` in the title. diff --git a/com.unity.ml-agents/Documentation~/LICENSE.md b/com.unity.ml-agents/Documentation~/LICENSE.md new file mode 100644 index 0000000000..b5bebb551d --- /dev/null +++ b/com.unity.ml-agents/Documentation~/LICENSE.md @@ -0,0 +1 @@ +{!../LICENSE.md!} diff --git a/com.unity.ml-agents/Documentation~/Learning-Environment-Create-New.md b/com.unity.ml-agents/Documentation~/Learning-Environment-Create-New.md new file mode 100644 index 0000000000..a56e402d1d --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Learning-Environment-Create-New.md @@ -0,0 +1,479 @@ +# Making a New Learning Environment + +This tutorial walks through the process of creating a Unity Environment from +scratch. We recommend first reading the [Getting Started](Getting-Started.md) +guide to understand the concepts presented here first in an already-built +environment. + +![A simple ML-Agents environment](images/mlagents-NewTutSplash.png) + +In this example, we will create an agent capable of controlling a ball on a +platform. We will then train the agent to roll the ball toward the cube while +avoiding falling off the platform. + +## Overview + +Using the ML-Agents Toolkit in a Unity project involves the following basic +steps: + +1. Create an environment for your agents to live in. An environment can range + from a simple physical simulation containing a few objects to an entire game + or ecosystem. +1. Implement your Agent subclasses. An Agent subclass defines the code an Agent + uses to observe its environment, to carry out assigned actions, and to + calculate the rewards used for reinforcement training. You can also implement + optional methods to reset the Agent when it has finished or failed its task. +1. Add your Agent subclasses to appropriate GameObjects, typically, the object + in the scene that represents the Agent in the simulation. + +**Note:** If you are unfamiliar with Unity, refer to the +[Unity manual](https://docs.unity3d.com/Manual/index.html) +if an Editor task isn't explained sufficiently in this tutorial. + +If you haven't already, follow the [installation instructions](Installation.md). + +## Set Up the Unity Project + +The first task to accomplish is simply creating a new Unity project and +importing the ML-Agents assets into it: + +1. Launch Unity Hub and create a new 3D project named "RollerBall". +1. [Add the ML-Agents Unity package](Installation.md#install-the-comunityml-agents-unity-package) + to your project. + +Your Unity **Project** window should contain the following assets: + +![Unity Project Window](images/roller-ball-projects.png){: style="width:250px"} + +## Create the Environment + +Next, we will create a very simple scene to act as our learning environment. The +"physical" components of the environment include a Plane to act as the floor for +the Agent to move around on, a Cube to act as the goal or target for the agent +to seek, and a Sphere to represent the Agent itself. + +### Create the Floor Plane + +1. Right click in Hierarchy window, select 3D Object > Plane. +1. Name the GameObject "Floor". +1. Select the Floor Plane to view its properties in the Inspector window. +1. Set Transform to Position = `(0, 0, 0)`, Rotation = `(0, 0, 0)`, Scale = + `(1, 1, 1)`. + +![Floor Inspector window](images/roller-ball-floor.png){: style="width:400px"} + +### Add the Target Cube + +1. Right click in Hierarchy window, select 3D Object > Cube. +1. Name the GameObject "Target". +1. Select the Target Cube to view its properties in the Inspector window. +1. Set Transform to Position = `(3, 0.5, 3)`, Rotation = `(0, 0, 0)`, Scale = + `(1, 1, 1)`. + +![Target Cube Inspector window](images/roller-ball-target.png){: style="width:400px"} + +### Add the Agent Sphere + +1. Right click in Hierarchy window, select 3D Object > Sphere. +1. Name the GameObject "RollerAgent". +1. Select the RollerAgent Sphere to view its properties in the Inspector window. +1. Set Transform to Position = `(0, 0.5, 0)`, Rotation = `(0, 0, 0)`, Scale = + `(1, 1, 1)`. +1. Click **Add Component**. +1. Add the `Rigidbody` component to the Sphere. + +### Group into Training Area + +Group the floor, target and agent under a single, empty, GameObject. This will simplify +some of our subsequent steps. + +To do so: + +1. Right-click on your Project Hierarchy and create a new empty GameObject. Name + it TrainingArea. +1. Reset the TrainingArea’s Transform so that it is at `(0,0,0)` with Rotation + `(0,0,0)` and Scale `(1,1,1)`. +1. Drag the Floor, Target, and RollerAgent GameObjects in the Hierarchy into the + TrainingArea GameObject. + +![Hierarchy window](images/roller-ball-hierarchy.png){: style="width:250px"} + +## Implement an Agent + +To create the Agent Script: + +1. Select the RollerAgent GameObject to view it in the Inspector window. +1. Click **Add Component**. +1. Click **New Script** in the list of components (at the bottom). +1. Name the script "RollerAgent". +1. Click **Create and Add**. + +Then, edit the new `RollerAgent` script: + +1. In the Unity Project window, double-click the `RollerAgent` script to open it + in your code editor. +1. Import ML-Agent package by adding + + ```csharp + using Unity.MLAgents; + using Unity.MLAgents.Sensors; + using Unity.MLAgents.Actuators; + ``` + then change the base class from `MonoBehaviour` to `Agent`. +1. Delete `Update()` since we are not using it, but keep `Start()`. + +So far, these are the basic steps that you would use to add ML-Agents to any +Unity project. Next, we will add the logic that will let our Agent learn to roll +to the cube using reinforcement learning. More specifically, we will need to +extend three methods from the `Agent` base class: + +- `OnEpisodeBegin()` +- `CollectObservations(VectorSensor sensor)` +- `OnActionReceived(ActionBuffers actionBuffers)` + +We overview each of these in more detail in the dedicated subsections below. + +### Initialization and Resetting the Agent + +The process of training in the ML-Agents Toolkit involves running episodes where +the Agent (Sphere) attempts to solve the task. Each episode lasts until the +Agents solves the task (i.e. reaches the cube), fails (rolls off the platform) +or times out (takes too long to solve or fail at the task). At the start of each +episode, `OnEpisodeBegin()` is called to set-up the environment for a +new episode. Typically the scene is initialized in a random manner to enable the +agent to learn to solve the task under a variety of conditions. + +In this example, each time the Agent (Sphere) reaches its target (Cube), the +episode ends and the target (Cube) is moved to a new random location; and if +the Agent rolls off the platform, it will be put back onto the floor. +These are all handled in `OnEpisodeBegin()`. + +To move the target (Cube), we need a reference to its Transform (which stores a +GameObject's position, orientation and scale in the 3D world). To get this +reference, add a public field of type `Transform` to the RollerAgent class. +Public fields of a component in Unity get displayed in the Inspector window, +allowing you to choose which GameObject to use as the target in the Unity +Editor. + +To reset the Agent's velocity (and later to apply force to move the agent) we +need a reference to the Rigidbody component. A +[Rigidbody](https://docs.unity3d.com/ScriptReference/Rigidbody.html) is Unity's +primary element for physics simulation. (See +[Physics](https://docs.unity3d.com/Manual/PhysicsSection.html) for full +documentation of Unity physics.) Since the Rigidbody component is on the same +GameObject as our Agent script, the best way to get this reference is using +`GameObject.GetComponent()`, which we can call in our script's `Start()` +method. + +So far, our RollerAgent script looks like: + +```csharp +using System.Collections.Generic; +using UnityEngine; +using Unity.MLAgents; +using Unity.MLAgents.Sensors; + +public class RollerAgent : Agent +{ + Rigidbody rBody; + void Start () { + rBody = GetComponent(); + } + + public Transform Target; + public override void OnEpisodeBegin() + { + // If the Agent fell, zero its momentum + if (this.transform.localPosition.y < 0) + { + this.rBody.angularVelocity = Vector3.zero; + this.rBody.velocity = Vector3.zero; + this.transform.localPosition = new Vector3( 0, 0.5f, 0); + } + + // Move the target to a new spot + Target.localPosition = new Vector3(Random.value * 8 - 4, + 0.5f, + Random.value * 8 - 4); + } +} +``` + +Next, let's implement the `Agent.CollectObservations(VectorSensor sensor)` +method. + +### Observing the Environment + +The Agent sends the information we collect to the Brain, which uses it to make a +decision. When you train the Agent (or use a trained model), the data is fed +into a neural network as a feature vector. For an Agent to successfully learn a +task, we need to provide the correct information. A good rule of thumb for +deciding what information to collect is to consider what you would need to +calculate an analytical solution to the problem. + +In our case, the information our Agent collects includes the position of the +target, the position of the agent itself, and the velocity of the agent. This +helps the Agent learn to control its speed so it doesn't overshoot the target +and roll off the platform. In total, the agent observation contains 8 values as +implemented below: + +```csharp +public override void CollectObservations(VectorSensor sensor) +{ + // Target and Agent positions + sensor.AddObservation(Target.localPosition); + sensor.AddObservation(this.transform.localPosition); + + // Agent velocity + sensor.AddObservation(rBody.velocity.x); + sensor.AddObservation(rBody.velocity.z); +} +``` + +### Taking Actions and Assigning Rewards + +The final part of the Agent code is the `Agent.OnActionReceived()` method, which +receives actions and assigns the reward. + +#### Actions + +To solve the task of moving towards the target, the Agent (Sphere) needs to be +able to move in the `x` and `z` directions. As such, the agent needs 2 actions: +the first determines the force applied along the x-axis; and the +second determines the force applied along the z-axis. (If we allowed the Agent +to move in three dimensions, then we would need a third action.) + +The RollerAgent applies the values from the `action[]` array to its Rigidbody +component `rBody`, using `Rigidbody.AddForce()`: + +```csharp +Vector3 controlSignal = Vector3.zero; +controlSignal.x = action[0]; +controlSignal.z = action[1]; +rBody.AddForce(controlSignal * forceMultiplier); +``` + +#### Rewards + +Reinforcement learning requires rewards to signal which decisions are good and +which are bad. The learning algorithm uses the rewards to determine whether it +is giving the Agent the optimal actions. You want to reward an Agent for +completing the assigned task. In this case, the Agent is given a reward of 1.0 +for reaching the Target cube. + +Rewards are assigned in `OnActionReceived()`. The RollerAgent +calculates the distance to detect when it reaches the target. +When it does, the code calls `Agent.SetReward()` to assign a reward +of 1.0 and marks the agent as finished by calling `EndEpisode()` on +the Agent. + +```csharp +float distanceToTarget = Vector3.Distance(this.transform.localPosition, Target.localPosition); +// Reached target +if (distanceToTarget < 1.42f) +{ + SetReward(1.0f); + EndEpisode(); +} +``` + +Finally, if the Agent falls off the platform, end the episode so that it can +reset itself: + +```csharp +// Fell off platform +if (this.transform.localPosition.y < 0) +{ + EndEpisode(); +} +``` + +#### OnActionReceived() + +With the action and reward logic outlined above, the final version of +`OnActionReceived()` looks like: + +```csharp +public float forceMultiplier = 10; +public override void OnActionReceived(ActionBuffers actionBuffers) +{ + // Actions, size = 2 + Vector3 controlSignal = Vector3.zero; + controlSignal.x = actionBuffers.ContinuousActions[0]; + controlSignal.z = actionBuffers.ContinuousActions[1]; + rBody.AddForce(controlSignal * forceMultiplier); + + // Rewards + float distanceToTarget = Vector3.Distance(this.transform.localPosition, Target.localPosition); + + // Reached target + if (distanceToTarget < 1.42f) + { + SetReward(1.0f); + EndEpisode(); + } + + // Fell off platform + else if (this.transform.localPosition.y < 0) + { + EndEpisode(); + } +} +``` + +Note the `forceMultiplier` class variable is defined before the method definition. +Since `forceMultiplier` is public, you can set the value from the Inspector window. + +## Final Agent Setup in Editor + +Now that all the GameObjects and ML-Agent components are in place, it is time +to connect everything together in the Unity Editor. This involves adding and +setting some of the Agent Component's properties so that they are compatible +with our Agent script. + +1. Select the **RollerAgent** GameObject to show its properties in the Inspector + window. +1. Drag the Target GameObject in the Hierarchy into the `Target` field in RollerAgent Script. +1. Add a `Decision Requester` script with the **Add Component** button. + Set the **Decision Period** to `10`. For more information on decisions, + see [the Agent documentation](Learning-Environment-Design-Agents.md#decisions) +1. Add a `Behavior Parameters` script with the **Add Component** button. + Set the Behavior Parameters of the Agent to the following: + - `Behavior Name`: _RollerBall_ + - `Vector Observation` > `Space Size` = 8 + - `Actions` > `Continuous Actions` = 2 + +In the inspector, the `RollerAgent` should look like this now: + +![Agent GameObject Inspector window](images/roller-ball-agent.png){: style="width:400px"} + +Now you are ready to test the environment before training. + +## Testing the Environment + +It is always a good idea to first test your environment by controlling the Agent +using the keyboard. To do so, you will need to extend the `Heuristic()` method +in the `RollerAgent` class. For our example, the heuristic will generate an +action corresponding to the values of the "Horizontal" and "Vertical" input axis +(which correspond to the keyboard arrow keys): + +```csharp +public override void Heuristic(in ActionBuffers actionsOut) +{ + var continuousActionsOut = actionsOut.ContinuousActions; + continuousActionsOut[0] = Input.GetAxis("Horizontal"); + continuousActionsOut[1] = Input.GetAxis("Vertical"); +} +``` + +In order for the Agent to use the Heuristic, You will need to set the +`Behavior Type` to `Heuristic Only` in the `Behavior Parameters` of the +RollerAgent. + +Press **Play** to run the scene and use the arrows keys to move the Agent around +the platform. Make sure that there are no errors displayed in the Unity Editor +Console window and that the Agent resets when it reaches its target or falls +from the platform. + +## Training the Environment + +The process is the same as described in the +[Getting Started Guide](Getting-Started.md). + +The hyperparameters for training are specified in a configuration file that you +pass to the `mlagents-learn` program. Create a new `rollerball_config.yaml` file +under `config/` and include the following hyperparameter values: + +```yml +behaviors: + RollerBall: + trainer_type: ppo + hyperparameters: + batch_size: 10 + buffer_size: 100 + learning_rate: 3.0e-4 + beta: 5.0e-4 + epsilon: 0.2 + lambd: 0.99 + num_epoch: 3 + learning_rate_schedule: linear + beta_schedule: constant + epsilon_schedule: linear + network_settings: + normalize: false + hidden_units: 128 + num_layers: 2 + reward_signals: + extrinsic: + gamma: 0.99 + strength: 1.0 + max_steps: 500000 + time_horizon: 64 + summary_freq: 10000 +``` + +Hyperparameters are explained in [the training configuration file documentation](Training-Configuration-File.md) + +Since this example creates a very simple training environment with only a few +inputs and outputs, using small batch and buffer sizes speeds up the training +considerably. However, if you add more complexity to the environment or change +the reward or observation functions, you might also find that training performs +better with different hyperparameter values. In addition to setting these +hyperparameter values, the Agent **DecisionFrequency** parameter has a large +effect on training time and success. A larger value reduces the number of +decisions the training algorithm has to consider and, in this simple +environment, speeds up training. + +To train your agent, run the following command before pressing **Play** in the +Editor: + + mlagents-learn config/rollerball_config.yaml --run-id=RollerBall + +To monitor the statistics of Agent performance during training, use +[TensorBoard](Using-Tensorboard.md). + +![TensorBoard statistics display](images/mlagents-RollerAgentStats.png) + +In particular, the _cumulative_reward_ and _value_estimate_ statistics show how +well the Agent is achieving the task. In this example, the maximum reward an +Agent can earn is 1.0, so these statistics approach that value when the Agent +has successfully _solved_ the problem. + +## Optional: Multiple Training Areas within the Same Scene + +In many of the [example environments](Learning-Environment-Examples.md), many +copies of the training area are instantiated in the scene. This generally speeds +up training, allowing the environment to gather many experiences in parallel. +This can be achieved simply by instantiating many Agents with the same +`Behavior Name`. Note that we've already simplified our transition to using +multiple areas by creating the `TrainingArea` GameObject and relying on local +positions in `RollerAgent.cs`. Use the following steps to parallelize your +RollerBall environment: + +1. Drag the TrainingArea GameObject, along with its attached GameObjects, into + your Assets browser, turning it into a prefab. +1. You can now instantiate copies of the TrainingArea prefab. Drag them into + your scene, positioning them so that they do not overlap. + +Alternatively, you can use the `TrainingAreaReplicator` to replicate training areas. Use the following steps: + +1. Create a new empty Game Object in the scene. +2. Click on the new object and add a TrainingAreaReplicator component to the empty Game Object through the inspector. +3. Drag the training area to `Base Area` in the Training Area Replicator. +4. Specify the number of areas to replicate and the separation between areas. +5. Hit play and the areas will be replicated automatically! + +## Optional: Training Using Concurrent Unity Instances +Another level of parallelization comes by training using +[concurrent Unity instances](ML-Agents-Overview.md#additional-features). +For example, + +``` +mlagents-learn config/rollerball_config.yaml --run-id=RollerBall --num-envs=2 +``` + +will start ML Agents training with two environment instances. Combining multiple +training areas within the same scene, with concurrent Unity instances, effectively +gives you two levels of parallelism to speed up training. The command line option +`--num-envs=` controls the number of concurrent Unity instances that are +executed in parallel during training. diff --git a/com.unity.ml-agents/Documentation~/Learning-Environment-Design-Agents.md b/com.unity.ml-agents/Documentation~/Learning-Environment-Design-Agents.md new file mode 100644 index 0000000000..dbfca2f53c --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Learning-Environment-Design-Agents.md @@ -0,0 +1,1185 @@ +# Agents + +**Table of Contents:** + +- [Agents](#agents) + - [Decisions](#decisions) + - [Observations and Sensors](#observations-and-sensors) + - [Generating Observations](#generating-observations) + - [Agent.CollectObservations()](#agentcollectobservations) + - [Observable Fields and Properties](#observable-fields-and-properties) + - [ISensor interface and SensorComponents](#isensor-interface-and-sensorcomponents) + - [Vector Observations](#vector-observations) + - [One-hot encoding categorical information](#one-hot-encoding-categorical-information) + - [Normalization](#normalization) + - [Stacking](#stacking) + - [Vector Observation Summary \& Best Practices](#vector-observation-summary--best-practices) + - [Visual Observations](#visual-observations) + - [Visual Observation Summary \& Best Practices](#visual-observation-summary--best-practices) + - [Raycast Observations](#raycast-observations) + - [RayCast Observation Summary \& Best Practices](#raycast-observation-summary--best-practices) + - [Grid Observations](#grid-observations) + - [Grid Observation Summary \& Best Practices](#grid-observation-summary--best-practices) + - [Variable Length Observations](#variable-length-observations) + - [Variable Length Observation Summary \& Best Practices](#variable-length-observation-summary--best-practices) + - [Goal Signal](#goal-signal) + - [Goal Signal Summary \& Best Practices](#goal-signal-summary--best-practices) + - [Actions and Actuators](#actions-and-actuators) + - [Continuous Actions](#continuous-actions) + - [Discrete Actions](#discrete-actions) + - [Masking Discrete Actions](#masking-discrete-actions) + - [IActuator interface and ActuatorComponents](#iactuator-interface-and-actuatorcomponents) + - [Actions Summary \& Best Practices](#actions-summary--best-practices) + - [Rewards](#rewards) + - [Examples](#examples) + - [Rewards Summary \& Best Practices](#rewards-summary--best-practices) + - [Agent Properties](#agent-properties) + - [Destroying an Agent](#destroying-an-agent) + - [Defining Multi-agent Scenarios](#defining-multi-agent-scenarios) + - [Teams for Adversarial Scenarios](#teams-for-adversarial-scenarios) + - [Groups for Cooperative Scenarios](#groups-for-cooperative-scenarios) + - [Cooperative Behaviors Notes and Best Practices](#cooperative-behaviors-notes-and-best-practices) + - [Recording Demonstrations](#recording-demonstrations) + +An agent is an entity that can observe its environment, decide on the best +course of action using those observations, and execute those actions within its +environment. Agents can be created in Unity by extending the `Agent` class. The +most important aspects of creating agents that can successfully learn are the +observations the agent collects, and the reward you assign to estimate the value +of the agent's current state toward accomplishing its tasks. + +An Agent passes its observations to its Policy. The Policy then makes a decision +and passes the chosen action back to the agent. Your agent code must execute the +action, for example, move the agent in one direction or another. In order to +[train an agent using reinforcement learning](Learning-Environment-Design.md), +your agent must calculate a reward value at each action. The reward is used to +discover the optimal decision-making policy. + +The `Policy` class abstracts out the decision making logic from the Agent itself +so that you can use the same Policy in multiple Agents. How a Policy makes its +decisions depends on the `Behavior Parameters` associated with the agent. If you +set `Behavior Type` to `Heuristic Only`, the Agent will use its `Heuristic()` +method to make decisions which can allow you to control the Agent manually or +write your own Policy. If the Agent has a `Model` file, its Policy will use the +neural network `Model` to take decisions. + +When you create an Agent, you should usually extend the base Agent class. This +includes implementing the following methods: + +- `Agent.OnEpisodeBegin()` — Called at the beginning of an Agent's episode, + including at the beginning of the simulation. +- `Agent.CollectObservations(VectorSensor sensor)` — Called every step that the Agent + requests a decision. This is one possible way for collecting the Agent's + observations of the environment; see [Generating Observations](#generating-observations) + below for more options. +- `Agent.OnActionReceived()` — Called every time the Agent receives an action to + take. Receives the action chosen by the Agent. It is also common to assign a + reward in this method. +- `Agent.Heuristic()` - When the `Behavior Type` is set to `Heuristic Only` in + the Behavior Parameters of the Agent, the Agent will use the `Heuristic()` + method to generate the actions of the Agent. As such, the `Heuristic()` method + writes to the array of floats provided to the Heuristic method as argument. + __Note__: Do not create a new float array of action in the `Heuristic()` method, + as this will prevent writing floats to the original action array. + +As a concrete example, here is how the Ball3DAgent class implements these methods: + +- `Agent.OnEpisodeBegin()` — Resets the agent cube and ball to their starting + positions. The function randomizes the reset values so that the training + generalizes to more than a specific starting position and agent cube + orientation. +- `Agent.CollectObservations(VectorSensor sensor)` — Adds information about the + orientation of the agent cube, the ball velocity, and the relative position + between the ball and the cube. Since the `CollectObservations()` + method calls `VectorSensor.AddObservation()` such that vector size adds up to 8, + the Behavior Parameters of the Agent are set with vector observation space + with a state size of 8. +- `Agent.OnActionReceived()` — The action results + in a small change in the agent cube's rotation at each step. In this example, + an Agent receives a small positive reward for each step it keeps the ball on the + agent cube's head and a larger, negative reward for dropping the ball. An + Agent's episode is also ended when it drops the ball so that it will reset + with a new ball for the next simulation step. +- `Agent.Heuristic()` - Converts the keyboard inputs into actions. + +## Decisions + +The observation-decision-action-reward cycle repeats each time the Agent request +a decision. Agents will request a decision when `Agent.RequestDecision()` is +called. If you need the Agent to request decisions on its own at regular +intervals, add a `Decision Requester` component to the Agent's GameObject. +Making decisions at regular step intervals is generally most appropriate for +physics-based simulations. For example, an agent in a robotic simulator that +must provide fine-control of joint torques should make its decisions every step +of the simulation. In games such as real-time strategy, where many agents make +their decisions at regular intervals, the decision timing for each agent can be +staggered by setting the `DecisionStep` parameter in the `Decision Requester` +component for each agent. On the other hand, an agent that only needs to make +decisions when certain game or simulation events occur, such as in a turn-based +game, should call `Agent.RequestDecision()` manually. + +## Observations and Sensors +In order for an agent to learn, the observations should include all the +information an agent needs to accomplish its task. Without sufficient and +relevant information, an agent may learn poorly or may not learn at all. A +reasonable approach for determining what information should be included is to +consider what you would need to calculate an analytical solution to the problem, +or what you would expect a human to be able to use to solve the problem. + +### Generating Observations +ML-Agents provides multiple ways for an Agent to make observations: + 1. Overriding the `Agent.CollectObservations()` method and passing the + observations to the provided `VectorSensor`. + 1. Adding the `[Observable]` attribute to fields and properties on the Agent. + 1. Implementing the `ISensor` interface, using a `SensorComponent` attached to + the Agent to create the `ISensor`. + +#### Agent.CollectObservations() +Agent.CollectObservations() is best used for aspects of the environment which are +numerical and non-visual. The Policy class calls the +`CollectObservations(VectorSensor sensor)` method of each Agent. Your +implementation of this function must call `VectorSensor.AddObservation` to add +vector observations. + +The `VectorSensor.AddObservation` method provides a number of overloads for +adding common types of data to your observation vector. You can add Integers and +booleans directly to the observation vector, as well as some common Unity data +types such as `Vector2`, `Vector3`, and `Quaternion`. + +For examples of various state observation functions, you can look at the +[example environments](Learning-Environment-Examples.md) included in the +ML-Agents SDK. For instance, the 3DBall example uses the rotation of the +platform, the relative position of the ball, and the velocity of the ball as its +state observation. + +```csharp +public GameObject ball; + +public override void CollectObservations(VectorSensor sensor) +{ + // Orientation of the cube (2 floats) + sensor.AddObservation(gameObject.transform.rotation.z); + sensor.AddObservation(gameObject.transform.rotation.x); + // Relative position of the ball to the cube (3 floats) + sensor.AddObservation(ball.transform.position - gameObject.transform.position); + // Velocity of the ball (3 floats) + sensor.AddObservation(m_BallRb.velocity); + // 8 floats total +} +``` + +As an experiment, you can remove the velocity components from +the observation and retrain the 3DBall agent. While it will learn to balance the +ball reasonably well, the performance of the agent without using velocity is +noticeably worse. + +The observations passed to `VectorSensor.AddObservation()` must always contain +the same number of elements must always be in the same order. If the number +of observed entities in an environment can vary, you can pad the calls +with zeros for any missing entities in a specific observation, or you can limit +an agent's observations to a fixed subset. For example, instead of observing +every enemy in an environment, you could only observe the closest five. + +Additionally, when you set up an Agent's `Behavior Parameters` in the Unity +Editor, you must set the **Vector Observations > Space Size** +to equal the number of floats that are written by `CollectObservations()`. + +#### Observable Fields and Properties +Another approach is to define the relevant observations as fields or properties +on your Agent class, and annotate them with an `ObservableAttribute`. For +example, in the Ball3DHardAgent, the difference between positions could be observed +by adding a property to the Agent: +```csharp +using Unity.MLAgents.Sensors.Reflection; + +public class Ball3DHardAgent : Agent { + + [Observable(numStackedObservations: 9)] + Vector3 PositionDelta + { + get + { + return ball.transform.position - gameObject.transform.position; + } + } +} +``` +`ObservableAttribute` currently supports most basic types (e.g. floats, ints, +bools), as well as `Vector2`, `Vector3`, `Vector4`, `Quaternion`, and enums. + +The behavior of `ObservableAttribute`s are controlled by the "Observable Attribute +Handling" in the Agent's `Behavior Parameters`. The possible values for this are: + * **Ignore** (default) - All ObservableAttributes on the Agent will be ignored. + If there are no ObservableAttributes on the Agent, this will result in the + fastest initialization time. + * **Exclude Inherited** - Only members on the declared class will be examined; + members that are inherited are ignored. This is a reasonable tradeoff between + performance and flexibility. + * **Examine All** All members on the class will be examined. This can lead to + slower startup times. + +"Exclude Inherited" is generally sufficient, but if your Agent inherits from +another Agent implementation that has Observable members, you will need to use +"Examine All". + +Internally, ObservableAttribute uses reflection to determine which members of +the Agent have ObservableAttributes, and also uses reflection to access the +fields or invoke the properties at runtime. This may be slower than using +CollectObservations or an ISensor, although this might not be enough to +noticeably affect performance. + +**NOTE**: you do not need to adjust the Space Size in the Agent's +`Behavior Parameters` when you add `[Observable]` fields or properties to an +Agent, since their size can be computed before they are used. + +#### ISensor interface and SensorComponents +The `ISensor` interface is generally intended for advanced users. The `Write()` +method is used to actually generate the observation, but some other methods +such as returning the shape of the observations must also be implemented. + +The `SensorComponent` abstract class is used to create the actual `ISensor` at +runtime. It must be attached to the same `GameObject` as the `Agent`, or to a +child `GameObject`. + +There are several SensorComponents provided in the API, including: +- `CameraSensorComponent` - Uses images from a `Camera` as observations. +- `RenderTextureSensorComponent` - Uses the content of a `RenderTexture` as +observations. +- `RayPerceptionSensorComponent` - Uses the information from set of ray casts +as observations. +- `Match3SensorComponent` - Uses the board of a [Match-3 game](Integrations-Match3.md) +as observations. +- `GridSensorComponent` - Uses a set of box queries in a grid shape as +observations. + +**NOTE**: you do not need to adjust the Space Size in the Agent's +`Behavior Parameters` when using `SensorComponents`s. + +Internally, both `Agent.CollectObservations` and `[Observable]` attribute use an +ISensors to write observations, although this is mostly abstracted from the user. + +### Vector Observations +Both `Agent.CollectObservations()` and `ObservableAttribute`s produce vector +observations, which are represented at lists of `float`s. `ISensor`s can +produce both vector observations and visual observations, which are +multi-dimensional arrays of floats. + +Below are some additional considerations when dealing with vector observations: + +#### One-hot encoding categorical information + +Type enumerations should be encoded in the _one-hot_ style. That is, add an +element to the feature vector for each element of enumeration, setting the +element representing the observed member to one and set the rest to zero. For +example, if your enumeration contains \[Sword, Shield, Bow\] and the agent +observes that the current item is a Bow, you would add the elements: 0, 0, 1 to +the feature vector. The following code example illustrates how to add. + +```csharp +enum ItemType { Sword, Shield, Bow, LastItem } +public override void CollectObservations(VectorSensor sensor) +{ + for (int ci = 0; ci < (int)ItemType.LastItem; ci++) + { + sensor.AddObservation((int)currentItem == ci ? 1.0f : 0.0f); + } +} +``` + +`VectorSensor` also provides a two-argument function `AddOneHotObservation()` as +a shortcut for _one-hot_ style observations. The following example is identical +to the previous one. + +```csharp +enum ItemType { Sword, Shield, Bow, LastItem } +const int NUM_ITEM_TYPES = (int)ItemType.LastItem + 1; + +public override void CollectObservations(VectorSensor sensor) +{ + // The first argument is the selection index; the second is the + // number of possibilities + sensor.AddOneHotObservation((int)currentItem, NUM_ITEM_TYPES); +} +``` + +`ObservableAttribute` has built-in support for enums. Note that you don't need +the `LastItem` placeholder in this case: +```csharp +enum ItemType { Sword, Shield, Bow } + +public class HeroAgent : Agent +{ + [Observable] + ItemType m_CurrentItem; +} +``` + +#### Normalization + +For the best results when training, you should normalize the components of your +feature vector to the range [-1, +1] or [0, 1]. When you normalize the values, +the PPO neural network can often converge to a solution faster. Note that it +isn't always necessary to normalize to these recommended ranges, but it is +considered a best practice when using neural networks. The greater the variation +in ranges between the components of your observation, the more likely that +training will be affected. + +To normalize a value to [0, 1], you can use the following formula: + +```csharp +normalizedValue = (currentValue - minValue)/(maxValue - minValue) +``` + +:warning: For vectors, you should apply the above formula to each component (x, +y, and z). Note that this is _not_ the same as using the `Vector3.normalized` +property or `Vector3.Normalize()` method in Unity (and similar for `Vector2`). + +Rotations and angles should also be normalized. For angles between 0 and 360 +degrees, you can use the following formulas: + +```csharp +Quaternion rotation = transform.rotation; +Vector3 normalized = rotation.eulerAngles / 180.0f - Vector3.one; // [-1,1] +Vector3 normalized = rotation.eulerAngles / 360.0f; // [0,1] +``` + +For angles that can be outside the range [0,360], you can either reduce the +angle, or, if the number of turns is significant, increase the maximum value +used in your normalization formula. + +#### Stacking +Stacking refers to repeating observations from previous steps as part of a +larger observation. For example, consider an Agent that generates these +observations in four steps +``` +step 1: [0.1] +step 2: [0.2] +step 3: [0.3] +step 4: [0.4] +``` + +If we use a stack size of 3, the observations would instead be: +```csharp +step 1: [0.1, 0.0, 0.0] +step 2: [0.2, 0.1, 0.0] +step 3: [0.3, 0.2, 0.1] +step 4: [0.4, 0.3, 0.2] +``` +(The observations are padded with zeroes for the first `stackSize-1` steps). +This is a simple way to give an Agent limited "memory" without the complexity +of adding a recurrent neural network (RNN). + +The steps for enabling stacking depends on how you generate observations: +* For Agent.CollectObservations(), set "Stacked Vectors" on the Agent's + `Behavior Parameters` to a value greater than 1. +* For ObservableAttribute, set the `numStackedObservations` parameter in the + constructor, e.g. `[Observable(numStackedObservations: 2)]`. +* For `ISensor`s, wrap them in a `StackingSensor` (which is also an `ISensor`). + Generally, this should happen in the `CreateSensor()` method of your + `SensorComponent`. + +#### Vector Observation Summary & Best Practices + +- Vector Observations should include all variables relevant for allowing the + agent to take the optimally informed decision, and ideally no extraneous + information. +- In cases where Vector Observations need to be remembered or compared over + time, either an RNN should be used in the model, or the `Stacked Vectors` + value in the agent GameObject's `Behavior Parameters` should be changed. +- Categorical variables such as type of object (Sword, Shield, Bow) should be + encoded in one-hot fashion (i.e. `3` -> `0, 0, 1`). This can be done + automatically using the `AddOneHotObservation()` method of the `VectorSensor`, + or using `[Observable]` on an enum field or property of the Agent. +- In general, all inputs should be normalized to be in the range 0 to +1 (or -1 + to 1). For example, the `x` position information of an agent where the maximum + possible value is `maxValue` should be recorded as + `VectorSensor.AddObservation(transform.position.x / maxValue);` rather than + `VectorSensor.AddObservation(transform.position.x);`. +- Positional information of relevant GameObjects should be encoded in relative + coordinates wherever possible. This is often relative to the agent position. + +### Visual Observations + +Visual observations are generally provided to agent via either a `CameraSensor` +or `RenderTextureSensor`. These collect image information and transforms it into +a 3D Tensor which can be fed into the convolutional neural network (CNN) of the +agent policy. For more information on CNNs, see +[this guide](http://cs231n.github.io/convolutional-networks/). This allows +agents to learn from spatial regularities in the observation images. It is +possible to use visual and vector observations with the same agent. + +Agents using visual observations can capture state of arbitrary complexity and +are useful when the state is difficult to describe numerically. However, they +are also typically less efficient and slower to train, and sometimes don't +succeed at all as compared to vector observations. As such, they should only be +used when it is not possible to properly define the problem using vector or +ray-cast observations. + +Visual observations can be derived from Cameras or RenderTextures within your +scene. To add a visual observation to an Agent, add either a Camera Sensor +Component or RenderTextures Sensor Component to the Agent. Then drag the camera +or render texture you want to add to the `Camera` or `RenderTexture` field. You +can have more than one camera or render texture and even use a combination of +both attached to an Agent. For each visual observation, set the width and height +of the image (in pixels) and whether or not the observation is color or +grayscale. + +![Agent Camera](images/visual-observation.png) + +or + +![Agent RenderTexture](images/visual-observation-rendertexture.png) + +Each Agent that uses the same Policy must have the same number of visual +observations, and they must all have the same resolutions (including whether or +not they are grayscale). Additionally, each Sensor Component on an Agent must +have a unique name so that they can be sorted deterministically (the name must +be unique for that Agent, but multiple Agents can have a Sensor Component with +the same name). + +Visual observations also support stacking, by specifying `Observation Stacks` +to a value greater than 1. The visual observations from the last `stackSize` +steps will be stacked on the last dimension (channel dimension). + +When using `RenderTexture` visual observations, a handy feature for debugging is +adding a `Canvas`, then adding a `Raw Image` with it's texture set to the +Agent's `RenderTexture`. This will render the agent observation on the game +screen. + +![RenderTexture with Raw Image](images/visual-observation-rawimage.png) + +The [GridWorld environment](Learning-Environment-Examples.md#gridworld) is an +example on how to use a RenderTexture for both debugging and observation. Note +that in this example, a Camera is rendered to a RenderTexture, which is then +used for observations and debugging. To update the RenderTexture, the Camera +must be asked to render every time a decision is requested within the game code. +When using Cameras as observations directly, this is done automatically by the +Agent. + +![Agent RenderTexture Debug](images/gridworld.png) + +#### Visual Observation Summary & Best Practices + +- To collect visual observations, attach `CameraSensor` or `RenderTextureSensor` + components to the agent GameObject. +- Visual observations should generally only be used when vector observations are + not sufficient. +- Image size should be kept as small as possible, without the loss of needed + details for decision making. +- Images should be made grayscale in situations where color information is not + needed for making informed decisions. + +### Raycast Observations + +Raycasts are another possible method for providing observations to an agent. +This can be easily implemented by adding a `RayPerceptionSensorComponent3D` (or +`RayPerceptionSensorComponent2D`) to the Agent GameObject. + +During observations, several rays (or spheres, depending on settings) are cast +into the physics world, and the objects that are hit determine the observation +vector that is produced. + +![Agent with two RayPerceptionSensorComponent3Ds](images/ray_perception.png) + +Both sensor components have several settings: + +- _Detectable Tags_ A list of strings corresponding to the types of objects that + the Agent should be able to distinguish between. For example, in the WallJump + example, we use "wall", "goal", and "block" as the list of objects to detect. +- _Rays Per Direction_ Determines the number of rays that are cast. One ray is + always cast forward, and this many rays are cast to the left and right. +- _Max Ray Degrees_ The angle (in degrees) for the outermost rays. 90 degrees + corresponds to the left and right of the agent. +- _Sphere Cast Radius_ The size of the sphere used for sphere casting. If set to + 0, rays will be used instead of spheres. Rays may be more efficient, + especially in complex scenes. +- _Ray Length_ The length of the casts +- _Ray Layer Mask_ The [LayerMask](https://docs.unity3d.com/ScriptReference/LayerMask.html) + passed to the raycast or spherecast. This can be used to ignore certain types + of objects when casting. +- _Observation Stacks_ The number of previous results to "stack" with the cast + results. Note that this can be independent of the "Stacked Vectors" setting in + `Behavior Parameters`. +- _Start Vertical Offset_ (3D only) The vertical offset of the ray start point. +- _End Vertical Offset_ (3D only) The vertical offset of the ray end point. +- _Alternating Ray Order_ Alternating is the default, it gives an order of (0, + -delta, delta, -2*delta, 2*delta, ..., -n*delta, n*delta). If alternating is + disabled the order is left to right (-n*delta, -(n-1)*delta, ..., -delta, 0, + delta, ..., (n-1)*delta, n*delta). For general usage there is no difference + but if using custom models the left-to-right layout that matches the spatial + structuring can be preferred (e.g. for processing with conv nets). +- _Use Batched Raycasts_ (3D only) Whether to use batched raycasts. Enable to use batched raycasts and the jobs system. + +In the example image above, the Agent has two `RayPerceptionSensorComponent3D`s. +Both use 3 Rays Per Direction and 90 Max Ray Degrees. One of the components had +a vertical offset, so the Agent can tell whether it's clear to jump over the +wall. + +The total size of the created observations is + +``` +(Observation Stacks) * (1 + 2 * Rays Per Direction) * (Num Detectable Tags + 2) +``` + +so the number of rays and tags should be kept as small as possible to reduce the +amount of data used. Note that this is separate from the State Size defined in +`Behavior Parameters`, so you don't need to worry about the formula above when +setting the State Size. + +#### RayCast Observation Summary & Best Practices + +- Attach `RayPerceptionSensorComponent3D` or `RayPerceptionSensorComponent2D` to + use. +- This observation type is best used when there is relevant spatial information + for the agent that doesn't require a fully rendered image to convey. +- Use as few rays and tags as necessary to solve the problem in order to improve + learning stability and agent performance. +- If you run into performance issues, try using batched raycasts by enabling the _Use Batched Raycast_ setting. + (Only available for 3D ray perception sensors.) + +### Grid Observations +Grid-base observations combine the advantages of 2D spatial representation in +visual observations, and the flexibility of defining detectable objects in +RayCast observations. The sensor uses a set of box queries in a grid shape and +gives a top-down 2D view around the agent. This can be implemented by adding a +`GridSensorComponent` to the Agent GameObject. + +During observations, the sensor detects the presence of detectable objects in +each cell and encode that into one-hot representation. The collected information +from each cell forms a 3D tensor observation and will be fed into the +convolutional neural network (CNN) of the agent policy just like visual +observations. + +![Agent with GridSensorComponent](images/grid_sensor.png) + +The sensor component has the following settings: +- _Cell Scale_ The scale of each cell in the grid. +- _Grid Size_ Number of cells on each side of the grid. +- _Agent Game Object_ The Agent that holds the grid sensor. This is used to + disambiguate objects with the same tag as the agent so that the agent doesn't + detect itself. +- _Rotate With Agent_ Whether the grid rotates with the Agent. +- _Detectable Tags_ A list of strings corresponding to the types of objects that + the Agent should be able to distinguish between. +- _Collider Mask_ The [LayerMask](https://docs.unity3d.com/ScriptReference/LayerMask.html) + passed to the collider detection. This can be used to ignore certain types + of objects. +- _Initial Collider Buffer Size_ The initial size of the Collider buffer used + in the non-allocating Physics calls for each cell. +- _Max Collider Buffer Size_ The max size of the Collider buffer used in the + non-allocating Physics calls for each cell. + +The observation for each grid cell is a one-hot encoding of the detected object. +The total size of the created observations is + +``` +GridSize.x * GridSize.z * Num Detectable Tags +``` + +so the number of detectable tags and size of the grid should be kept as small as +possible to reduce the amount of data used. This makes a trade-off between the +granularity of the observation and training speed. + +To allow more variety of observations that grid sensor can capture, the +`GridSensorComponent` and the underlying `GridSensorBase` also provides interfaces +that can be overridden to collect customized observation from detected objects. +See the Unity package documentation for more details on custom grid sensors. + +__Note__: The `GridSensor` only works in 3D environments and will not behave +properly in 2D environments. + +#### Grid Observation Summary & Best Practices + +- Attach `GridSensorComponent` to use. +- This observation type is best used when there is relevant non-visual spatial information that + can be best captured in 2D representations. +- Use as small grid size and as few tags as necessary to solve the problem in order to improve + learning stability and agent performance. +- Do not use `GridSensor` in a 2D game. + +### Variable Length Observations + +It is possible for agents to collect observations from a varying number of +GameObjects by using a `BufferSensor`. +You can add a `BufferSensor` to your Agent by adding a `BufferSensorComponent` to +its GameObject. +The `BufferSensor` can be useful in situations in which the Agent must pay +attention to a varying number of entities (for example, a varying number of +enemies or projectiles). +On the trainer side, the `BufferSensor` +is processed using an attention module. More information about attention +mechanisms can be found [here](https://arxiv.org/abs/1706.03762). Training or +doing inference with variable length observations can be slower than using +a flat vector observation. However, attention mechanisms enable solving +problems that require comparative reasoning between entities in a scene +such as our [Sorter environment](Learning-Environment-Examples.md#sorter). +Note that even though the `BufferSensor` can process a variable number of +entities, you still need to define a maximum number of entities. This is +because our network architecture requires to know what the shape of the +observations will be. If fewer entities are observed than the maximum, the +observation will be padded with zeros and the trainer will ignore +the padded observations. Note that attention layers are invariant to +the order of the entities, so there is no need to properly "order" the +entities before feeding them into the `BufferSensor`. + +The `BufferSensorComponent` Editor inspector has two arguments: + + - `Observation Size` : This is how many floats each entities will be + represented with. This number is fixed and all entities must + have the same representation. For example, if the entities you want to + put into the `BufferSensor` have for relevant information position and + speed, then the `Observation Size` should be 6 floats. + - `Maximum Number of Entities` : This is the maximum number of entities + the `BufferSensor` will be able to collect. + +To add an entity's observations to a `BufferSensorComponent`, you need +to call `BufferSensorComponent.AppendObservation()` in the +Agent.CollectObservations() method +with a float array of size `Observation Size` as argument. + +__Note__: Currently, the observations put into the `BufferSensor` are +not normalized, you will need to normalize your observations manually +between -1 and 1. + +#### Variable Length Observation Summary & Best Practices + - Attach `BufferSensorComponent` to use. + - Call `BufferSensorComponent.AppendObservation()` in the + Agent.CollectObservations() methodto add the observations + of an entity to the `BufferSensor`. + - Normalize the entities observations before feeding them into the `BufferSensor`. + +### Goal Signal + +It is possible for agents to collect observations that will be treated as "goal signal". +A goal signal is used to condition the policy of the agent, meaning that if the goal +changes, the policy (i.e. the mapping from observations to actions) will change +as well. Note that this is true +for any observation since all observations influence the policy of the Agent to +some degree. But by specifying a goal signal explicitly, we can make this conditioning +more important to the agent. This feature can be used in settings where an agent +must learn to solve different tasks that are similar by some aspects because the +agent will learn to reuse learnings from different tasks to generalize better. +In Unity, you can specify that a `VectorSensor` or +a `CameraSensor` is a goal by attaching a `VectorSensorComponent` or a +`CameraSensorComponent` to the Agent and selecting `Goal Signal` as `Observation Type`. +On the trainer side, there are two different ways to condition the policy. This +setting is determined by the +[goal_conditioning_type parameter](Training-Configuration-File.md#common-trainer-configurations). +If set to `hyper` (default) a [HyperNetwork](https://arxiv.org/pdf/1609.09106.pdf) +will be used to generate some of the +weights of the policy using the goal observations as input. Note that using a +HyperNetwork requires a lot of computations, it is recommended to use a smaller +number of hidden units in the policy to alleviate this. +If set to `none` the goal signal will be considered as regular observations. +For an example on how to use a goal signal, see the +[GridWorld example](Learning-Environment-Examples.md#gridworld). + +#### Goal Signal Summary & Best Practices + - Attach a `VectorSensorComponent` or `CameraSensorComponent` to an agent and + set the observation type to goal to use the feature. + - Set the goal_conditioning_type parameter in the training configuration. + - Reduce the number of hidden units in the network when using the HyperNetwork + conditioning type. + +## Actions and Actuators + +An action is an instruction from the Policy that the agent carries out. The +action is passed to the an `IActionReceiver` (either an `Agent` or an `IActuator`) +as the `ActionBuffers` parameter when the Academy invokes the +`IActionReciever.OnActionReceived()` function. +There are two types of actions supported: **Continuous** and **Discrete**. + +Neither the Policy nor the training algorithm know anything about what the +action values themselves mean. The training algorithm simply tries different +values for the action list and observes the affect on the accumulated rewards +over time and many training episodes. Thus, the only place actions are defined +for an Agent is in the `OnActionReceived()` function. + +For example, if you designed an agent to move in two dimensions, you could use +either continuous or the discrete actions. In the continuous case, you +would set the action size to two (one for each dimension), and the +agent's Policy would output an action with two floating point values. In the +discrete case, you would use one Branch with a size of four (one for each +direction), and the Policy would create an action array containing a single +element with a value ranging from zero to three. Alternatively, you could create +two branches of size two (one for horizontal movement and one for vertical +movement), and the Policy would output an action array containing two elements +with values ranging from zero to one. You could alternatively use a combination of continuous +and discrete actions e.g., using one continuous action for horizontal movement +and a discrete branch of size two for the vertical movement. + +Note that when you are programming actions for an agent, it is often helpful to +test your action logic using the `Heuristic()` method of the Agent, which lets +you map keyboard commands to actions. + +### Continuous Actions + +When an Agent's Policy has **Continuous** actions, the +`ActionBuffers.ContinuousActions` passed to the Agent's `OnActionReceived()` function +is an array with length equal to the `Continuous Action Size` property value. The +individual values in the array have whatever meanings that you ascribe to them. +If you assign an element in the array as the speed of an Agent, for example, the +training process learns to control the speed of the Agent through this +parameter. + +The [3DBall example](Learning-Environment-Examples.md#3dball-3d-balance-ball) uses +continuous actions with two control values. + +![3DBall](images/balance.png) + +These control values are applied as rotation to the cube: + +```csharp + public override void OnActionReceived(ActionBuffers actionBuffers) + { + var actionZ = 2f * Mathf.Clamp(actionBuffers.ContinuousActions[0], -1f, 1f); + var actionX = 2f * Mathf.Clamp(actionBuffers.ContinuousActions[1], -1f, 1f); + + gameObject.transform.Rotate(new Vector3(0, 0, 1), actionZ); + gameObject.transform.Rotate(new Vector3(1, 0, 0), actionX); + } +``` + +By default the output from our provided PPO algorithm pre-clamps the values of +`ActionBuffers.ContinuousActions` into the [-1, 1] range. It is a best practice to manually clip +these as well, if you plan to use a 3rd party algorithm with your environment. +As shown above, you can scale the control values as needed after clamping them. + +### Discrete Actions + +When an Agent's Policy uses **discrete** actions, the +`ActionBuffers.DiscreteActions` passed to the Agent's `OnActionReceived()` function +is an array of integers with length equal to `Discrete Branch Size`. When defining the discrete actions, `Branches` +is an array of integers, each value corresponds to the number of possibilities for each branch. + +For example, if we wanted an Agent that can move in a plane and jump, we could +define two branches (one for motion and one for jumping) because we want our +agent be able to move **and** jump concurrently. We define the first branch to +have 5 possible actions (don't move, go left, go right, go backward, go forward) +and the second one to have 2 possible actions (don't jump, jump). The +`OnActionReceived()` method would look something like: + +```csharp +// Get the action index for movement +int movement = actionBuffers.DiscreteActions[0]; +// Get the action index for jumping +int jump = actionBuffers.DiscreteActions[1]; + +// Look up the index in the movement action list: +if (movement == 1) { directionX = -1; } +if (movement == 2) { directionX = 1; } +if (movement == 3) { directionZ = -1; } +if (movement == 4) { directionZ = 1; } +// Look up the index in the jump action list: +if (jump == 1 && IsGrounded()) { directionY = 1; } + +// Apply the action results to move the Agent +gameObject.GetComponent().AddForce( + new Vector3( + directionX * 40f, directionY * 300f, directionZ * 40f)); +``` + +#### Masking Discrete Actions + +When using Discrete Actions, it is possible to specify that some actions are +impossible for the next decision. When the Agent is controlled by a neural +network, the Agent will be unable to perform the specified action. Note that +when the Agent is controlled by its Heuristic, the Agent will still be able to +decide to perform the masked action. In order to disallow an action, override +the `Agent.WriteDiscreteActionMask()` virtual method, and call +`SetActionEnabled()` on the provided `IDiscreteActionMask`: + +```csharp +public override void WriteDiscreteActionMask(IDiscreteActionMask actionMask) +{ + actionMask.SetActionEnabled(branch, actionIndex, isEnabled); +} +``` + +Where: + +- `branch` is the index (starting at 0) of the branch on which you want to +allow or disallow the action +- `actionIndex` is the index of the action that you want to allow or disallow. +- `isEnabled` is a bool indicating whether the action should be allowed or now. + +For example, if you have an Agent with 2 branches and on the first branch +(branch 0) there are 4 possible actions : _"do nothing"_, _"jump"_, _"shoot"_ +and _"change weapon"_. Then with the code bellow, the Agent will either _"do +nothing"_ or _"change weapon"_ for their next decision (since action index 1 and 2 +are masked) + +```csharp +actionMask.SetActionEnabled(0, 1, false); +actionMask.SetActionEnabled(0, 2, false); +``` + +Notes: + +- You can call `SetActionEnabled` multiple times if you want to put masks on multiple + branches. +- At each step, the state of an action is reset and enabled by default. +- You cannot mask all the actions of a branch. +- You cannot mask actions in continuous control. + + +### IActuator interface and ActuatorComponents +The Actuator API allows users to abstract behavior out of Agents and in to +components (similar to the ISensor API). The `IActuator` interface and `Agent` +class both implement the `IActionReceiver` interface to allow for backward compatibility +with the current `Agent.OnActionReceived`. +This means you will not have to change your code until you decide to use the `IActuator` API. + +Like the `ISensor` interface, the `IActuator` interface is intended for advanced users. + +The `ActuatorComponent` abstract class is used to create the actual `IActuator` at +runtime. It must be attached to the same `GameObject` as the `Agent`, or to a +child `GameObject`. Actuators and all of their data structures are initialized +during `Agent.Initialize`. This was done to prevent an unexpected allocations at runtime. + +You can find an example of an `IActuator` implementation in the `Basic` example scene. +**NOTE**: you do not need to adjust the Actions in the Agent's +`Behavior Parameters` when using an `IActuator` and `ActuatorComponents`. + +Internally, `Agent.OnActionReceived` uses an `IActuator` to send actions to the Agent, +although this is mostly abstracted from the user. + + +### Actions Summary & Best Practices + +- Agents can use `Discrete` and/or `Continuous` actions. +- Discrete actions can have multiple action branches, and it's possible to mask + certain actions so that they won't be taken. +- In general, fewer actions will make for easier learning. +- Be sure to set the Continuous Action Size and Discrete Branch Size to the desired + number for each type of action, and not greater, as doing the latter can interfere with the + efficiency of the training process. +- Continuous action values should be clipped to an + appropriate range. The provided PPO model automatically clips these values + between -1 and 1, but third party training systems may not do so. + +## Rewards + +In reinforcement learning, the reward is a signal that the agent has done +something right. The PPO reinforcement learning algorithm works by optimizing +the choices an agent makes such that the agent earns the highest cumulative +reward over time. The better your reward mechanism, the better your agent will +learn. + +**Note:** Rewards are not used during inference by an Agent using a trained +model and is also not used during imitation learning. + +Perhaps the best advice is to start simple and only add complexity as needed. In +general, you should reward results rather than actions you think will lead to +the desired results. You can even use the Agent's Heuristic to control the Agent +while watching how it accumulates rewards. + +Allocate rewards to an Agent by calling the `AddReward()` or `SetReward()` +methods on the agent. The reward assigned between each decision should be in the +range [-1,1]. Values outside this range can lead to unstable training. The +`reward` value is reset to zero when the agent receives a new decision. If there +are multiple calls to `AddReward()` for a single agent decision, the rewards +will be summed together to evaluate how good the previous decision was. The +`SetReward()` will override all previous rewards given to an agent since the +previous decision. + +### Examples + +You can examine the `OnActionReceived()` functions defined in the +[example environments](Learning-Environment-Examples.md) to see how those +projects allocate rewards. + +The `GridAgent` class in the +[GridWorld example](Learning-Environment-Examples.md#gridworld) uses a very +simple reward system: + +```csharp +Collider[] hitObjects = Physics.OverlapBox(trueAgent.transform.position, + new Vector3(0.3f, 0.3f, 0.3f)); +if (hitObjects.Where(col => col.gameObject.tag == "goal").ToArray().Length == 1) +{ + AddReward(1.0f); + EndEpisode(); +} +else if (hitObjects.Where(col => col.gameObject.tag == "pit").ToArray().Length == 1) +{ + AddReward(-1f); + EndEpisode(); +} +``` + +The agent receives a positive reward when it reaches the goal and a negative +reward when it falls into the pit. Otherwise, it gets no rewards. This is an +example of a _sparse_ reward system. The agent must explore a lot to find the +infrequent reward. + +In contrast, the `AreaAgent` in the +[Area example](Learning-Environment-Examples.md#push-block) gets a small +negative reward every step. In order to get the maximum reward, the agent must +finish its task of reaching the goal square as quickly as possible: + +```csharp +AddReward( -0.005f); +MoveAgent(act); + +if (gameObject.transform.position.y < 0.0f || + Mathf.Abs(gameObject.transform.position.x - area.transform.position.x) > 8f || + Mathf.Abs(gameObject.transform.position.z + 5 - area.transform.position.z) > 8) +{ + AddReward(-1f); + EndEpisode(); +} +``` + +The agent also gets a larger negative penalty if it falls off the playing +surface. + +The `Ball3DAgent` in the +[3DBall](Learning-Environment-Examples.md#3dball-3d-balance-ball) takes a +similar approach, but allocates a small positive reward as long as the agent +balances the ball. The agent can maximize its rewards by keeping the ball on the +platform: + +```csharp + +SetReward(0.1f); + +// When ball falls mark Agent as finished and give a negative penalty +if ((ball.transform.position.y - gameObject.transform.position.y) < -2f || + Mathf.Abs(ball.transform.position.x - gameObject.transform.position.x) > 3f || + Mathf.Abs(ball.transform.position.z - gameObject.transform.position.z) > 3f) +{ + SetReward(-1f); + EndEpisode(); + +} +``` + +The `Ball3DAgent` also assigns a negative penalty when the ball falls off the +platform. + +Note that all of these environments make use of the `EndEpisode()` method, which +manually terminates an episode when a termination condition is reached. This can +be called independently of the `Max Step` property. + +### Rewards Summary & Best Practices + +- Use `AddReward()` to accumulate rewards between decisions. Use `SetReward()` + to overwrite any previous rewards accumulate between decisions. +- The magnitude of any given reward should typically not be greater than 1.0 in + order to ensure a more stable learning process. +- Positive rewards are often more helpful to shaping the desired behavior of an + agent than negative rewards. Excessive negative rewards can result in the + agent failing to learn any meaningful behavior. +- For locomotion tasks, a small positive reward (+0.1) for forward velocity is + typically used. +- If you want the agent to finish a task quickly, it is often helpful to provide + a small penalty every step (-0.05) that the agent does not complete the task. + In this case completion of the task should also coincide with the end of the + episode by calling `EndEpisode()` on the agent when it has accomplished its + goal. + +## Agent Properties + +![Agent Inspector](images/3dball_learning_brain.png) + +- `Behavior Parameters` - The parameters dictating what Policy the Agent will + receive. + - `Behavior Name` - The identifier for the behavior. Agents with the same + behavior name will learn the same policy. + - `Vector Observation` + - `Space Size` - Length of vector observation for the Agent. + - `Stacked Vectors` - The number of previous vector observations that will + be stacked and used collectively for decision making. This results in the + effective size of the vector observation being passed to the Policy being: + _Space Size_ x _Stacked Vectors_. + - `Actions` + - `Continuous Actions` - The number of concurrent continuous actions that + the Agent can take. + - `Discrete Branches` - An array of integers, defines multiple concurrent + discrete actions. The values in the `Discrete Branches` array correspond + to the number of possible discrete values for each action branch. + - `Model` - The neural network model used for inference (obtained after + training) + - `Inference Device` - Whether to use CPU or GPU to run the model during + inference + - `Behavior Type` - Determines whether the Agent will do training, inference, + or use its Heuristic() method: + - `Default` - the Agent will train if they connect to a python trainer, + otherwise they will perform inference. + - `Heuristic Only` - the Agent will always use the `Heuristic()` method. + - `Inference Only` - the Agent will always perform inference. + - `Team ID` - Used to define the team for self-play + - `Use Child Sensors` - Whether to use all Sensor components attached to child + GameObjects of this Agent. +- `Max Step` - The per-agent maximum number of steps. Once this number is + reached, the Agent will be reset. + +## Destroying an Agent + +You can destroy an Agent GameObject during the simulation. Make sure that there +is always at least one Agent training at all times by either spawning a new +Agent every time one is destroyed or by re-spawning new Agents when the whole +environment resets. + +## Defining Multi-agent Scenarios + +### Teams for Adversarial Scenarios + +Self-play is triggered by including the self-play hyperparameter hierarchy in +the [trainer configuration](Training-ML-Agents.md#training-configurations). To +distinguish opposing agents, set the team ID to different integer values in the +behavior parameters script on the agent prefab. + +

+ Team ID +

+ +**_Team ID must be 0 or an integer greater than 0._** + +In symmetric games, since all agents (even on opposing teams) will share the +same policy, they should have the same 'Behavior Name' in their Behavior +Parameters Script. In asymmetric games, they should have a different Behavior +Name in their Behavior Parameters script. Note, in asymmetric games, the agents +must have both different Behavior Names _and_ different team IDs! + +For examples of how to use this feature, you can see the trainer configurations +and agent prefabs for our Tennis and Soccer environments. Tennis and Soccer +provide examples of symmetric games. To train an asymmetric game, specify +trainer configurations for each of your behavior names and include the self-play +hyperparameter hierarchy in both. + +### Groups for Cooperative Scenarios + +Cooperative behavior in ML-Agents can be enabled by instantiating a `SimpleMultiAgentGroup`, +typically in an environment controller or similar script, and adding agents to it +using the `RegisterAgent(Agent agent)` method. Note that all agents added to the same `SimpleMultiAgentGroup` +must have the same behavior name and Behavior Parameters. Using `SimpleMultiAgentGroup` enables the +agents within a group to learn how to work together to achieve a common goal (i.e., +maximize a group-given reward), even if one or more of the group members are removed +before the episode ends. You can then use this group to add/set rewards, end or interrupt episodes +at a group level using the `AddGroupReward()`, `SetGroupReward()`, `EndGroupEpisode()`, and +`GroupEpisodeInterrupted()` methods. For example: + +```csharp +// Create a Multi Agent Group in Start() or Initialize() +m_AgentGroup = new SimpleMultiAgentGroup(); + +// Register agents in group at the beginning of an episode +for (var agent in AgentList) +{ + m_AgentGroup.RegisterAgent(agent); +} + +// if the team scores a goal +m_AgentGroup.AddGroupReward(rewardForGoal); + +// If the goal is reached and the episode is over +m_AgentGroup.EndGroupEpisode(); +ResetScene(); + +// If time ran out and we need to interrupt the episode +m_AgentGroup.GroupEpisodeInterrupted(); +ResetScene(); +``` + +Multi Agent Groups should be used with the MA-POCA trainer, which is explicitly designed to train +cooperative environments. This can be enabled by using the `poca` trainer - see the +[training configurations](Training-Configuration-File.md) doc for more information on +configuring MA-POCA. When using MA-POCA, agents which are deactivated or removed from the Scene +during the episode will still learn to contribute to the group's long term rewards, even +if they are not active in the scene to experience them. + +See the [Cooperative Push Block](Learning-Environment-Examples.md#cooperative-push-block) environment +for an example of how to use Multi Agent Groups, and the +[Dungeon Escape](Learning-Environment-Examples.md#dungeon-escape) environment for an example of +how the Multi Agent Group can be used with agents that are removed from the scene mid-episode. + +**NOTE**: Groups differ from Teams (for competitive settings) in the following way - Agents +working together should be added to the same Group, while agents playing against each other +should be given different Team Ids. If in the Scene there is one playing field and two teams, +there should be two Groups, one for each team, and each team should be assigned a different +Team Id. If this playing field is duplicated many times in the Scene (e.g. for training +speedup), there should be two Groups _per playing field_, and two unique Team Ids +_for the entire Scene_. In environments with both Groups and Team Ids configured, MA-POCA and +self-play can be used together for training. In the diagram below, there are two agents on each team, +and two playing fields where teams are pitted against each other. All the blue agents should share a Team Id +(and the orange ones a different ID), and there should be four group managers, one per pair of agents. + +

+ Group Manager vs Team Id +

+ +Please see the [SoccerTwos](Learning-Environment-Examples.md#soccer-twos) environment for an example. + +#### Cooperative Behaviors Notes and Best Practices +* An agent can only be registered to one MultiAgentGroup at a time. If you want to re-assign an +agent from one group to another, you have to unregister it from the current group first. + +* Agents with different behavior names in the same group are not supported. + +* Agents within groups should always set the `Max Steps` parameter in the Agent script to 0. +Instead, handle Max Steps using the MultiAgentGroup by ending the episode for the entire +Group using `GroupEpisodeInterrupted()`. + +* `EndGroupEpisode` and `GroupEpisodeInterrupted` do the same job in the game, but has +slightly different effect on the training. If the episode is completed, you would want to call +`EndGroupEpisode`. But if the episode is not over but it has been running for enough steps, i.e. +reaching max step, you would call `GroupEpisodeInterrupted`. + +* If an agent finished earlier, e.g. completed tasks/be removed/be killed in the game, do not call +`EndEpisode()` on the Agent. Instead, disable the agent and re-enable it when the next episode starts, +or destroy the agent entirely. This is because calling `EndEpisode()` will call `OnEpisodeBegin()`, which +will reset the agent immediately. While it is possible to call `EndEpisode()` in this way, it is usually not the +desired behavior when training groups of agents. + +* If an agent that was disabled in a scene needs to be re-enabled, it must be re-registered to the MultiAgentGroup. + +* Group rewards are meant to reinforce agents to act in the group's best interest instead of +individual ones, and are treated differently than individual agent rewards during +training. So calling `AddGroupReward()` is not equivalent to calling agent.AddReward() on each agent +in the group. + +* You can still add incremental rewards to agents using `Agent.AddReward()` if they are +in a Group. These rewards will only be given to those agents and are received when the +Agent is active. + +* Environments which use Multi Agent Groups can be trained using PPO or SAC, but agents will +not be able to learn from group rewards after deactivation/removal, nor will they behave as cooperatively. + +## Recording Demonstrations + +In order to record demonstrations from an agent, add the +`Demonstration Recorder` component to a GameObject in the scene which contains +an `Agent` component. Once added, it is possible to name the demonstration that +will be recorded from the agent. + +

+ Demonstration Recorder +

+ +When `Record` is checked, a demonstration will be created whenever the scene is +played from the Editor. Depending on the complexity of the task, anywhere from a +few minutes or a few hours of demonstration data may be necessary to be useful +for imitation learning. To specify an exact number of steps you want to record +use the `Num Steps To Record` field and the editor will end your play session +automatically once that many steps are recorded. If you set `Num Steps To Record` +to `0` then recording will continue until you manually end the play session. Once +the play session ends a `.demo` file will be created in the `Assets/Demonstrations` +folder (by default). This file contains the demonstrations. Clicking on the file will +provide metadata about the demonstration in the inspector. + +

+ Demonstration Inspector +

+ +You can then specify the path to this file in your +[training configurations](Training-Configuration-File.md#behavioral-cloning). diff --git a/com.unity.ml-agents/Documentation~/Learning-Environment-Design.md b/com.unity.ml-agents/Documentation~/Learning-Environment-Design.md new file mode 100644 index 0000000000..984be604fc --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Learning-Environment-Design.md @@ -0,0 +1,167 @@ +# Designing a Learning Environment + +This page contains general advice on how to design your learning environment, in +addition to overviewing aspects of the ML-Agents Unity SDK that pertain to +setting up your scene and simulation as opposed to designing your agents within +the scene. We have a dedicated page on +[Designing Agents](Learning-Environment-Design-Agents.md) which includes how to +instrument observations, actions and rewards, define teams for multi-agent +scenarios and record agent demonstrations for imitation learning. + +To help on-board to the entire set of functionality provided by the ML-Agents +Toolkit, we recommend exploring our [API documentation](API-Reference.md). +Additionally, our [example environments](Learning-Environment-Examples.md) are a +great resource as they provide sample usage of almost all of our features. + +## The Simulation and Training Process + +Training and simulation proceed in steps orchestrated by the ML-Agents Academy +class. The Academy works with Agent objects in the scene to step through the +simulation. + +During training, the external Python training process communicates with the +Academy to run a series of episodes while it collects data and optimizes its +neural network model. When training is completed successfully, you can add the +trained model file to your Unity project for later use. + +The ML-Agents Academy class orchestrates the agent simulation loop as follows: + +1. Calls your Academy's `OnEnvironmentReset` delegate. +1. Calls the `OnEpisodeBegin()` function for each Agent in the scene. +1. Gathers information about the scene. This is done by calling the + `CollectObservations(VectorSensor sensor)` function for each Agent in the + scene, as well as updating their sensor and collecting the resulting + observations. +1. Uses each Agent's Policy to decide on the Agent's next action. +1. Calls the `OnActionReceived()` function for each Agent in the scene, passing + in the action chosen by the Agent's Policy. +1. Calls the Agent's `OnEpisodeBegin()` function if the Agent has reached its + `Max Step` count or has otherwise marked itself as `EndEpisode()`. + +To create a training environment, extend the Agent class to implement the above +methods whether you need to implement them or not depends on your specific +scenario. + +## Organizing the Unity Scene + +To train and use the ML-Agents Toolkit in a Unity scene, the scene as many Agent +subclasses as you need. Agent instances should be attached to the GameObject +representing that Agent. + +### Academy + +The Academy is a singleton which orchestrates Agents and their decision making +processes. Only a single Academy exists at a time. + +#### Academy resetting + +To alter the environment at the start of each episode, add your method to the +Academy's OnEnvironmentReset action. + +```csharp +public class MySceneBehavior : MonoBehaviour +{ + public void Awake() + { + Academy.Instance.OnEnvironmentReset += EnvironmentReset; + } + + void EnvironmentReset() + { + // Reset the scene here + } +} +``` + +For example, you might want to reset an Agent to its starting position or move a +goal to a random position. An environment resets when the `reset()` method is +called on the Python `UnityEnvironment`. + +When you reset an environment, consider the factors that should change so that +training is generalizable to different conditions. For example, if you were +training a maze-solving agent, you would probably want to change the maze itself +for each training episode. Otherwise, the agent would probably on learn to solve +one, particular maze, not mazes in general. + +### Multiple Areas + +In many of the example environments, many copies of the training area are +instantiated in the scene. This generally speeds up training, allowing the +environment to gather many experiences in parallel. This can be achieved simply +by instantiating many Agents with the same Behavior Name. If possible, consider +designing your scene to support multiple areas. + +Check out our example environments to see examples of multiple areas. +Additionally, the +[Making a New Learning Environment](Learning-Environment-Create-New.md#optional-multiple-training-areas-within-the-same-scene) +guide demonstrates this option. + +## Environments + +When you create a training environment in Unity, you must set up the scene so +that it can be controlled by the external training process. Considerations +include: + +- The training scene must start automatically when your Unity application is + launched by the training process. +- The Academy must reset the scene to a valid starting point for each episode of + training. +- A training episode must have a definite end — either using `Max Steps` or by + each Agent ending its episode manually with `EndEpisode()`. + +## Environment Parameters + +Curriculum learning and environment parameter randomization are two training +methods that control specific parameters in your environment. As such, it is +important to ensure that your environment parameters are updated at each step to +the correct values. To enable this, we expose a `EnvironmentParameters` C# class +that you can use to retrieve the values of the parameters defined in the +training configurations for both of those features. Please see our +[documentation](Training-ML-Agents.md#environment-parameters) +for curriculum learning and environment parameter randomization for details. + +We recommend modifying the environment from the Agent's `OnEpisodeBegin()` +function by leveraging `Academy.Instance.EnvironmentParameters`. See the +WallJump example environment for a sample usage (specifically, +[WallJumpAgent.cs](../Project/Assets/ML-Agents/Examples/WallJump/Scripts/WallJumpAgent.cs) +). + +## Agent + +The Agent class represents an actor in the scene that collects observations and +carries out actions. The Agent class is typically attached to the GameObject in +the scene that otherwise represents the actor — for example, to a player object +in a football game or a car object in a vehicle simulation. Every Agent must +have appropriate `Behavior Parameters`. + +Generally, when creating an Agent, you should extend the Agent class and implement +the `CollectObservations(VectorSensor sensor)` and `OnActionReceived()` methods: + +- `CollectObservations(VectorSensor sensor)` — Collects the Agent's observation + of its environment. +- `OnActionReceived()` — Carries out the action chosen by the Agent's Policy and + assigns a reward to the current state. + +Your implementations of these functions determine how the Behavior Parameters +assigned to this Agent must be set. + +You must also determine how an Agent finishes its task or times out. You can +manually terminate an Agent episode in your `OnActionReceived()` function when +the Agent has finished (or irrevocably failed) its task by calling the +`EndEpisode()` function. You can also set the Agent's `Max Steps` property to a +positive value and the Agent will consider the episode over after it has taken +that many steps. You can use the `Agent.OnEpisodeBegin()` function to prepare +the Agent to start again. + +See [Agents](Learning-Environment-Design-Agents.md) for detailed information +about programming your own Agents. + +## Recording Statistics + +We offer developers a mechanism to record statistics from within their Unity +environments. These statistics are aggregated and generated during the training +process. To record statistics, see the `StatsRecorder` C# class. + +See the FoodCollector example environment for a sample usage (specifically, +[FoodCollectorSettings.cs](../Project/Assets/ML-Agents/Examples/FoodCollector/Scripts/FoodCollectorSettings.cs) +). diff --git a/com.unity.ml-agents/Documentation~/Learning-Environment-Examples.md b/com.unity.ml-agents/Documentation~/Learning-Environment-Examples.md new file mode 100644 index 0000000000..ba46a5847c --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Learning-Environment-Examples.md @@ -0,0 +1,500 @@ +# Example Learning Environments + + + +The Unity ML-Agents Toolkit includes an expanding set of example environments +that highlight the various features of the toolkit. These environments can also +serve as templates for new environments or as ways to test new ML algorithms. +Environments are located in `Project/Assets/ML-Agents/Examples` and summarized +below. + +For the environments that highlight specific features of the toolkit, we provide +the pre-trained model files and the training config file that enables you to +train the scene yourself. The environments that are designed to serve as +challenges for researchers do not have accompanying pre-trained model files or +training configs and are marked as _Optional_ below. + +This page only overviews the example environments we provide. To learn more on +how to design and build your own environments see our +[Making a New Learning Environment](Learning-Environment-Create-New.md) page. If +you would like to contribute environments, please see our +[contribution guidelines](CONTRIBUTING.md) page. + +## Basic + +![Basic](images/basic.png) + +- Set-up: A linear movement task where the agent must move left or right to + rewarding states. +- Goal: Move to the most reward state. +- Agents: The environment contains one agent. +- Agent Reward Function: + - -0.01 at each step + - +0.1 for arriving at suboptimal state. + - +1.0 for arriving at optimal state. +- Behavior Parameters: + - Vector Observation space: One variable corresponding to current state. + - Actions: 1 discrete action branch with 3 actions (Move left, do nothing, move + right). + - Visual Observations: None +- Float Properties: None +- Benchmark Mean Reward: 0.93 + +## 3DBall: 3D Balance Ball + +![3D Balance Ball](images/balance.png) + +- Set-up: A balance-ball task, where the agent balances the ball on it's head. +- Goal: The agent must balance the ball on it's head for as long as possible. +- Agents: The environment contains 12 agents of the same kind, all using the + same Behavior Parameters. +- Agent Reward Function: + - +0.1 for every step the ball remains on it's head. + - -1.0 if the ball falls off. +- Behavior Parameters: + - Vector Observation space: 8 variables corresponding to rotation of the agent + cube, and position and velocity of ball. + - Vector Observation space (Hard Version): 5 variables corresponding to + rotation of the agent cube and position of ball. + - Actions: 2 continuous actions, with one value corresponding to + X-rotation, and the other to Z-rotation. + - Visual Observations: Third-person view from the upper-front of the agent. Use + `Visual3DBall` scene. +- Float Properties: Three + - scale: Specifies the scale of the ball in the 3 dimensions (equal across the + three dimensions) + - Default: 1 + - Recommended Minimum: 0.2 + - Recommended Maximum: 5 + - gravity: Magnitude of gravity + - Default: 9.81 + - Recommended Minimum: 4 + - Recommended Maximum: 105 + - mass: Specifies mass of the ball + - Default: 1 + - Recommended Minimum: 0.1 + - Recommended Maximum: 20 +- Benchmark Mean Reward: 100 + +## GridWorld + +![GridWorld](images/gridworld.png) + +- Set-up: A multi-goal version of the grid-world task. Scene contains agent, goal, + and obstacles. +- Goal: The agent must navigate the grid to the appropriate goal while + avoiding the obstacles. +- Agents: The environment contains nine agents with the same Behavior + Parameters. +- Agent Reward Function: + - -0.01 for every step. + - +1.0 if the agent navigates to the correct goal (episode ends). + - -1.0 if the agent navigates to an incorrect goal (episode ends). +- Behavior Parameters: + - Vector Observation space: None + - Actions: 1 discrete action branch with 5 actions, corresponding to movement in + cardinal directions or not moving. Note that for this environment, + [action masking](Learning-Environment-Design-Agents.md#masking-discrete-actions) + is turned on by default (this option can be toggled using the `Mask Actions` + checkbox within the `trueAgent` GameObject). The trained model file provided + was generated with action masking turned on. + - Visual Observations: One corresponding to top-down view of GridWorld. + - Goal Signal : A one hot vector corresponding to which color is the correct goal + for the Agent +- Float Properties: Three, corresponding to grid size, number of green goals, and + number of red goals. +- Benchmark Mean Reward: 0.8 + +## Push Block + +![Push](images/push.png) + +- Set-up: A platforming environment where the agent can push a block around. +- Goal: The agent must push the block to the goal. +- Agents: The environment contains one agent. +- Agent Reward Function: + - -0.0025 for every step. + - +1.0 if the block touches the goal. +- Behavior Parameters: + - Vector Observation space: (Continuous) 70 variables corresponding to 14 + ray-casts each detecting one of three possible objects (wall, goal, or + block). + - Actions: 1 discrete action branch with 7 actions, corresponding to turn clockwise + and counterclockwise, move along four different face directions, or do nothing. +- Float Properties: Four + - block_scale: Scale of the block along the x and z dimensions + - Default: 2 + - Recommended Minimum: 0.5 + - Recommended Maximum: 4 + - dynamic_friction: Coefficient of friction for the ground material acting on + moving objects + - Default: 0 + - Recommended Minimum: 0 + - Recommended Maximum: 1 + - static_friction: Coefficient of friction for the ground material acting on + stationary objects + - Default: 0 + - Recommended Minimum: 0 + - Recommended Maximum: 1 + - block_drag: Effect of air resistance on block + - Default: 0.5 + - Recommended Minimum: 0 + - Recommended Maximum: 2000 +- Benchmark Mean Reward: 4.5 + +## Wall Jump + +![Wall](images/wall.png) + +- Set-up: A platforming environment where the agent can jump over a wall. +- Goal: The agent must use the block to scale the wall and reach the goal. +- Agents: The environment contains one agent linked to two different Models. The + Policy the agent is linked to changes depending on the height of the wall. The + change of Policy is done in the WallJumpAgent class. +- Agent Reward Function: + - -0.0005 for every step. + - +1.0 if the agent touches the goal. + - -1.0 if the agent falls off the platform. +- Behavior Parameters: + - Vector Observation space: Size of 74, corresponding to 14 ray casts each + detecting 4 possible objects. plus the global position of the agent and + whether or not the agent is grounded. + - Actions: 4 discrete action branches: + - Forward Motion (3 possible actions: Forward, Backwards, No Action) + - Rotation (3 possible actions: Rotate Left, Rotate Right, No Action) + - Side Motion (3 possible actions: Left, Right, No Action) + - Jump (2 possible actions: Jump, No Action) + - Visual Observations: None +- Float Properties: Four +- Benchmark Mean Reward (Big & Small Wall): 0.8 + +## Crawler + +![Crawler](images/crawler.png) + +- Set-up: A creature with 4 arms and 4 forearms. +- Goal: The agents must move its body toward the goal direction without falling. +- Agents: The environment contains 10 agents with same Behavior Parameters. +- Agent Reward Function (independent): + The reward function is now geometric meaning the reward each step is a product + of all the rewards instead of a sum, this helps the agent try to maximize all + rewards instead of the easiest rewards. + - Body velocity matches goal velocity. (normalized between (0,1)) + - Head direction alignment with goal direction. (normalized between (0,1)) +- Behavior Parameters: + - Vector Observation space: 172 variables corresponding to position, rotation, + velocity, and angular velocities of each limb plus the acceleration and + angular acceleration of the body. + - Actions: 20 continuous actions, corresponding to target + rotations for joints. + - Visual Observations: None +- Float Properties: None +- Benchmark Mean Reward: 3000 + +## Worm + +![Worm](images/worm.png) + +- Set-up: A worm with a head and 3 body segments. +- Goal: The agents must move its body toward the goal direction. +- Agents: The environment contains 10 agents with same Behavior Parameters. +- Agent Reward Function (independent): + The reward function is now geometric meaning the reward each step is a product + of all the rewards instead of a sum, this helps the agent try to maximize all + rewards instead of the easiest rewards. + - Body velocity matches goal velocity. (normalized between (0,1)) + - Body direction alignment with goal direction. (normalized between (0,1)) +- Behavior Parameters: + - Vector Observation space: 64 variables corresponding to position, rotation, + velocity, and angular velocities of each limb plus the acceleration and + angular acceleration of the body. + - Actions: 9 continuous actions, corresponding to target + rotations for joints. + - Visual Observations: None +- Float Properties: None +- Benchmark Mean Reward: 800 + +## Food Collector + +![Collector](images/foodCollector.png) + +- Set-up: A multi-agent environment where agents compete to collect food. +- Goal: The agents must learn to collect as many green food spheres as possible + while avoiding red spheres. +- Agents: The environment contains 5 agents with same Behavior Parameters. +- Agent Reward Function (independent): + - +1 for interaction with green spheres + - -1 for interaction with red spheres +- Behavior Parameters: + - Vector Observation space: 53 corresponding to velocity of agent (2), whether + agent is frozen and/or shot its laser (2), plus grid based perception of + objects around agent's forward direction (40 by 40 with 6 different categories). + - Actions: + - 3 continuous actions correspond to Forward Motion, Side Motion and Rotation + - 1 discrete action branch for Laser with 2 possible actions corresponding to + Shoot Laser or No Action + - Visual Observations (Optional): First-person camera per-agent, plus one vector + flag representing the frozen state of the agent. This scene uses a combination + of vector and visual observations and the training will not succeed without + the frozen vector flag. Use `VisualFoodCollector` scene. +- Float Properties: Two + - laser_length: Length of the laser used by the agent + - Default: 1 + - Recommended Minimum: 0.2 + - Recommended Maximum: 7 + - agent_scale: Specifies the scale of the agent in the 3 dimensions (equal + across the three dimensions) + - Default: 1 + - Recommended Minimum: 0.5 + - Recommended Maximum: 5 +- Benchmark Mean Reward: 10 + +## Hallway + +![Hallway](images/hallway.png) + +- Set-up: Environment where the agent needs to find information in a room, + remember it, and use it to move to the correct goal. +- Goal: Move to the goal which corresponds to the color of the block in the + room. +- Agents: The environment contains one agent. +- Agent Reward Function (independent): + - +1 For moving to correct goal. + - -0.1 For moving to incorrect goal. + - -0.0003 Existential penalty. +- Behavior Parameters: + - Vector Observation space: 30 corresponding to local ray-casts detecting + objects, goals, and walls. + - Actions: 1 discrete action Branch, with 4 actions corresponding to agent + rotation and forward/backward movement. +- Float Properties: None +- Benchmark Mean Reward: 0.7 + - To train this environment, you can enable curiosity by adding the `curiosity` reward signal + in `config/ppo/Hallway.yaml` + +## Soccer Twos + +![SoccerTwos](images/soccer.png) + +- Set-up: Environment where four agents compete in a 2 vs 2 toy soccer game. +- Goal: + - Get the ball into the opponent's goal while preventing the ball from + entering own goal. +- Agents: The environment contains two different Multi Agent Groups with two agents in each. + Parameters : SoccerTwos. +- Agent Reward Function (dependent): + - (1 - `accumulated time penalty`) When ball enters opponent's goal + `accumulated time penalty` is incremented by (1 / `MaxStep`) every fixed + update and is reset to 0 at the beginning of an episode. + - -1 When ball enters team's goal. +- Behavior Parameters: + - Vector Observation space: 336 corresponding to 11 ray-casts forward + distributed over 120 degrees and 3 ray-casts backward distributed over 90 + degrees each detecting 6 possible object types, along with the object's + distance. The forward ray-casts contribute 264 state dimensions and backward + 72 state dimensions over three observation stacks. + - Actions: 3 discrete branched actions corresponding to + forward, backward, sideways movement, as well as rotation. + - Visual Observations: None +- Float Properties: Two + - ball_scale: Specifies the scale of the ball in the 3 dimensions (equal + across the three dimensions) + - Default: 7.5 + - Recommended minimum: 4 + - Recommended maximum: 10 + - gravity: Magnitude of the gravity + - Default: 9.81 + - Recommended minimum: 6 + - Recommended maximum: 20 + +## Strikers Vs. Goalie + +![StrikersVsGoalie](images/strikersvsgoalie.png) + +- Set-up: Environment where two agents compete in a 2 vs 1 soccer variant. +- Goal: + - Striker: Get the ball into the opponent's goal. + - Goalie: Keep the ball out of the goal. +- Agents: The environment contains two different Multi Agent Groups. One with two Strikers and the other one Goalie. + Behavior Parameters : Striker, Goalie. +- Striker Agent Reward Function (dependent): + - +1 When ball enters opponent's goal. + - -0.001 Existential penalty. +- Goalie Agent Reward Function (dependent): + - -1 When ball enters goal. + - 0.001 Existential bonus. +- Behavior Parameters: + - Striker Vector Observation space: 294 corresponding to 11 ray-casts forward + distributed over 120 degrees and 3 ray-casts backward distributed over 90 + degrees each detecting 5 possible object types, along with the object's + distance. The forward ray-casts contribute 231 state dimensions and backward + 63 state dimensions over three observation stacks. + - Striker Actions: 3 discrete branched actions corresponding + to forward, backward, sideways movement, as well as rotation. + - Goalie Vector Observation space: 738 corresponding to 41 ray-casts + distributed over 360 degrees each detecting 4 possible object types, along + with the object's distance and 3 observation stacks. + - Goalie Actions: 3 discrete branched actions corresponding + to forward, backward, sideways movement, as well as rotation. + - Visual Observations: None +- Float Properties: Two + - ball_scale: Specifies the scale of the ball in the 3 dimensions (equal + across the three dimensions) + - Default: 7.5 + - Recommended minimum: 4 + - Recommended maximum: 10 + - gravity: Magnitude of the gravity + - Default: 9.81 + - Recommended minimum: 6 + - Recommended maximum: 20 + +## Walker + +![Walker](images/walker.png) + +- Set-up: Physics-based Humanoid agents with 26 degrees of freedom. These DOFs + correspond to articulation of the following body-parts: hips, chest, spine, + head, thighs, shins, feet, arms, forearms and hands. +- Goal: The agents must move its body toward the goal direction without falling. +- Agents: The environment contains 10 independent agents with same Behavior + Parameters. +- Agent Reward Function (independent): + The reward function is now geometric meaning the reward each step is a product + of all the rewards instead of a sum, this helps the agent try to maximize all + rewards instead of the easiest rewards. + - Body velocity matches goal velocity. (normalized between (0,1)) + - Head direction alignment with goal direction. (normalized between (0,1)) +- Behavior Parameters: + - Vector Observation space: 243 variables corresponding to position, rotation, + velocity, and angular velocities of each limb, along with goal direction. + - Actions: 39 continuous actions, corresponding to target + rotations and strength applicable to the joints. + - Visual Observations: None +- Float Properties: Four + - gravity: Magnitude of gravity + - Default: 9.81 + - Recommended Minimum: + - Recommended Maximum: + - hip_mass: Mass of the hip component of the walker + - Default: 8 + - Recommended Minimum: 7 + - Recommended Maximum: 28 + - chest_mass: Mass of the chest component of the walker + - Default: 8 + - Recommended Minimum: 3 + - Recommended Maximum: 20 + - spine_mass: Mass of the spine component of the walker + - Default: 8 + - Recommended Minimum: 3 + - Recommended Maximum: 20 +- Benchmark Mean Reward : 2500 + + +## Pyramids + +![Pyramids](images/pyramids.png) + +- Set-up: Environment where the agent needs to press a button to spawn a + pyramid, then navigate to the pyramid, knock it over, and move to the gold + brick at the top. +- Goal: Move to the golden brick on top of the spawned pyramid. +- Agents: The environment contains one agent. +- Agent Reward Function (independent): + - +2 For moving to golden brick (minus 0.001 per step). +- Behavior Parameters: + - Vector Observation space: 148 corresponding to local ray-casts detecting + switch, bricks, golden brick, and walls, plus variable indicating switch + state. + - Actions: 1 discrete action branch, with 4 actions corresponding to agent rotation and + forward/backward movement. +- Float Properties: None +- Benchmark Mean Reward: 1.75 + +## Match 3 +![Match 3](images/match3.png) + +- Set-up: Simple match-3 game. Matched pieces are removed, and remaining pieces +drop down. New pieces are spawned randomly at the top, with a chance of being +"special". +- Goal: Maximize score from matching pieces. +- Agents: The environment contains several independent Agents. +- Agent Reward Function (independent): + - .01 for each normal piece cleared. Special pieces are worth 2x or 3x. +- Behavior Parameters: + - None + - Observations and actions are defined with a sensor and actuator respectively. +- Float Properties: None +- Benchmark Mean Reward: + - 39.5 for visual observations + - 38.5 for vector observations + - 34.2 for simple heuristic (pick a random valid move) + - 37.0 for greedy heuristic (pick the highest-scoring valid move) + +## Sorter +![Sorter](images/sorter.png) + + - Set-up: The Agent is in a circular room with numbered tiles. The values of the + tiles are random between 1 and 20. The tiles present in the room are randomized + at each episode. When the Agent visits a tile, it turns green. + - Goal: Visit all the tiles in ascending order. + - Agents: The environment contains a single Agent + - Agent Reward Function: + - -.0002 Existential penalty. + - +1 For visiting the right tile + - -1 For visiting the wrong tile + - BehaviorParameters: + - Vector Observations : 4 : 2 floats for Position and 2 floats for orientation + - Variable Length Observations : Between 1 and 20 entities (one for each tile) + each with 22 observations, the first 20 are one hot encoding of the value of the tile, + the 21st and 22nd represent the position of the tile relative to the Agent and the 23rd + is `1` if the tile was visited and `0` otherwise. + - Actions: 3 discrete branched actions corresponding to forward, backward, + sideways movement, as well as rotation. + - Float Properties: One + - num_tiles: The maximum number of tiles to sample. + - Default: 2 + - Recommended Minimum: 1 + - Recommended Maximum: 20 + - Benchmark Mean Reward: Depends on the number of tiles. + +## Cooperative Push Block +![CoopPushBlock](images/cooperative_pushblock.png) + +- Set-up: Similar to Push Block, the agents are in an area with blocks that need +to be pushed into a goal. Small blocks can be pushed by one agents and are worth ++1 value, medium blocks require two agents to push in and are worth +2, and large +blocks require all 3 agents to push and are worth +3. +- Goal: Push all blocks into the goal. +- Agents: The environment contains three Agents in a Multi Agent Group. +- Agent Reward Function: + - -0.0001 Existential penalty, as a group reward. + - +1, +2, or +3 for pushing in a block, added as a group reward. +- Behavior Parameters: + - Observation space: A single Grid Sensor with separate tags for each block size, + the goal, the walls, and other agents. + - Actions: 1 discrete action branch with 7 actions, corresponding to turn clockwise + and counterclockwise, move along four different face directions, or do nothing. +- Float Properties: None +- Benchmark Mean Reward: 11 (Group Reward) + +## Dungeon Escape +![DungeonEscape](images/dungeon_escape.png) + +- Set-up: Agents are trapped in a dungeon with a dragon, and must work together to escape. + To retrieve the key, one of the agents must find and slay the dragon, sacrificing itself + to do so. The dragon will drop a key for the others to use. The other agents can then pick + up this key and unlock the dungeon door. If the agents take too long, the dragon will escape + through a portal and the environment resets. +- Goal: Unlock the dungeon door and leave. +- Agents: The environment contains three Agents in a Multi Agent Group and one Dragon, which + moves in a predetermined pattern. +- Agent Reward Function: + - +1 group reward if any agent successfully unlocks the door and leaves the dungeon. +- Behavior Parameters: + - Observation space: A Ray Perception Sensor with separate tags for the walls, other agents, + the door, key, the dragon, and the dragon's portal. A single Vector Observation which indicates + whether the agent is holding a key. + - Actions: 1 discrete action branch with 7 actions, corresponding to turn clockwise + and counterclockwise, move along four different face directions, or do nothing. +- Float Properties: None +- Benchmark Mean Reward: 1.0 (Group Reward) diff --git a/com.unity.ml-agents/Documentation~/Learning-Environment-Executable.md b/com.unity.ml-agents/Documentation~/Learning-Environment-Executable.md new file mode 100644 index 0000000000..f5ea6936cc --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Learning-Environment-Executable.md @@ -0,0 +1,198 @@ +# Using an Environment Executable + +This section will help you create and use built environments rather than the +Editor to interact with an environment. Using an executable has some advantages +over using the Editor: + +- You can exchange executable with other people without having to share your + entire repository. +- You can put your executable on a remote machine for faster training. +- You can use `Server Build` (`Headless`) mode for faster training (as long as the executable does not need rendering). +- You can keep using the Unity Editor for other tasks while the agents are + training. + +## Building the 3DBall environment + +The first step is to open the Unity scene containing the 3D Balance Ball +environment: + +1. Launch Unity. +1. On the Projects dialog, choose the **Open** option at the top of the window. +1. Using the file dialog that opens, locate the `Project` folder within the + ML-Agents project and click **Open**. +1. In the **Project** window, navigate to the folder + `Assets/ML-Agents/Examples/3DBall/Scenes/`. +1. Double-click the `3DBall` file to load the scene containing the Balance Ball + environment. + +![3DBall Scene](images/mlagents-Open3DBall.png) + +Next, we want the set up scene to play correctly when the training process +launches our environment executable. This means: + +- The environment application runs in the background. +- No dialogs require interaction. +- The correct scene loads automatically. + +1. Open Player Settings (menu: **Edit** > **Project Settings** > **Player**). +1. Under **Resolution and Presentation**: + - Ensure that **Run in Background** is Checked. + - Ensure that **Display Resolution Dialog** is set to Disabled. (Note: this + setting may not be available in newer versions of the editor.) +1. Open the Build Settings window (menu:**File** > **Build Settings**). +1. Choose your target platform. + - (optional) Select “Development Build” to + [log debug messages](https://docs.unity3d.com/Manual/LogFiles.html). +1. If any scenes are shown in the **Scenes in Build** list, make sure that the + 3DBall Scene is the only one checked. (If the list is empty, then only the + current scene is included in the build). +1. Click **Build**: + - In the File dialog, navigate to your ML-Agents directory. + - Assign a file name and click **Save**. + - (For Windows)With Unity 2018.1, it will ask you to select a folder instead + of a file name. Create a subfolder within the root directory and select + that folder to build. In the following steps you will refer to this + subfolder's name as `env_name`. You cannot create builds in the Assets + folder + +![Build Window](images/mlagents-BuildWindow.png) + +Now that we have a Unity executable containing the simulation environment, we +can interact with it. + +## Interacting with the Environment + +If you want to use the [Python API](Python-LLAPI.md) to interact with your +executable, you can pass the name of the executable with the argument +'file_name' of the `UnityEnvironment`. For instance: + +```python +from mlagents_envs.environment import UnityEnvironment +env = UnityEnvironment(file_name=) +``` + +## Training the Environment + +1. Open a command or terminal window. +1. Navigate to the folder where you installed the ML-Agents Toolkit. If you + followed the default [installation](Installation.md), then navigate to the + `ml-agents/` folder. +1. Run + `mlagents-learn --env= --run-id=` + Where: + - `` is the file path of the trainer configuration yaml + - `` is the name and path to the executable you exported from Unity + (without extension) + - `` is a string used to separate the results of different + training runs + +For example, if you are training with a 3DBall executable, and you saved it to +the directory where you installed the ML-Agents Toolkit, run: + +```sh +mlagents-learn config/ppo/3DBall.yaml --env=3DBall --run-id=firstRun +``` + +And you should see something like + +```console +ml-agents$ mlagents-learn config/ppo/3DBall.yaml --env=3DBall --run-id=first-run + + + ▄▄▄▓▓▓▓ + ╓▓▓▓▓▓▓█▓▓▓▓▓ + ,▄▄▄m▀▀▀' ,▓▓▓▀▓▓▄ ▓▓▓ ▓▓▌ + ▄▓▓▓▀' ▄▓▓▀ ▓▓▓ ▄▄ ▄▄ ,▄▄ ▄▄▄▄ ,▄▄ ▄▓▓▌▄ ▄▄▄ ,▄▄ + ▄▓▓▓▀ ▄▓▓▀ ▐▓▓▌ ▓▓▌ ▐▓▓ ▐▓▓▓▀▀▀▓▓▌ ▓▓▓ ▀▓▓▌▀ ^▓▓▌ ╒▓▓▌ + ▄▓▓▓▓▓▄▄▄▄▄▄▄▄▓▓▓ ▓▀ ▓▓▌ ▐▓▓ ▐▓▓ ▓▓▓ ▓▓▓ ▓▓▌ ▐▓▓▄ ▓▓▌ + ▀▓▓▓▓▀▀▀▀▀▀▀▀▀▀▓▓▄ ▓▓ ▓▓▌ ▐▓▓ ▐▓▓ ▓▓▓ ▓▓▓ ▓▓▌ ▐▓▓▐▓▓ + ^█▓▓▓ ▀▓▓▄ ▐▓▓▌ ▓▓▓▓▄▓▓▓▓ ▐▓▓ ▓▓▓ ▓▓▓ ▓▓▓▄ ▓▓▓▓` + '▀▓▓▓▄ ^▓▓▓ ▓▓▓ └▀▀▀▀ ▀▀ ^▀▀ `▀▀ `▀▀ '▀▀ ▐▓▓▌ + ▀▀▀▀▓▄▄▄ ▓▓▓▓▓▓, ▓▓▓▓▀ + `▀█▓▓▓▓▓▓▓▓▓▌ + ¬`▀▀▀█▓ + +``` + +**Note**: If you're using Anaconda, don't forget to activate the ml-agents +environment first. + +If `mlagents-learn` runs correctly and starts training, you should see something +like this: + +```console +CrashReporter: initialized +Mono path[0] = '/Users/dericp/workspace/ml-agents/3DBall.app/Contents/Resources/Data/Managed' +Mono config path = '/Users/dericp/workspace/ml-agents/3DBall.app/Contents/MonoBleedingEdge/etc' +INFO:mlagents_envs: +'Ball3DAcademy' started successfully! +Unity Academy name: Ball3DAcademy + +INFO:mlagents_envs:Connected new brain: +Unity brain name: Ball3DLearning + Number of Visual Observations (per agent): 0 + Vector Observation space size (per agent): 8 + Number of stacked Vector Observation: 1 +INFO:mlagents_envs:Hyperparameters for the PPO Trainer of brain Ball3DLearning: + batch_size: 64 + beta: 0.001 + buffer_size: 12000 + epsilon: 0.2 + gamma: 0.995 + hidden_units: 128 + lambd: 0.99 + learning_rate: 0.0003 + max_steps: 5.0e4 + normalize: True + num_epoch: 3 + num_layers: 2 + time_horizon: 1000 + sequence_length: 64 + summary_freq: 1000 + use_recurrent: False + memory_size: 256 + use_curiosity: False + curiosity_strength: 0.01 + curiosity_enc_size: 128 + output_path: ./results/first-run-0/Ball3DLearning +INFO:mlagents.trainers: first-run-0: Ball3DLearning: Step: 1000. Mean Reward: 1.242. Std of Reward: 0.746. Training. +INFO:mlagents.trainers: first-run-0: Ball3DLearning: Step: 2000. Mean Reward: 1.319. Std of Reward: 0.693. Training. +INFO:mlagents.trainers: first-run-0: Ball3DLearning: Step: 3000. Mean Reward: 1.804. Std of Reward: 1.056. Training. +INFO:mlagents.trainers: first-run-0: Ball3DLearning: Step: 4000. Mean Reward: 2.151. Std of Reward: 1.432. Training. +INFO:mlagents.trainers: first-run-0: Ball3DLearning: Step: 5000. Mean Reward: 3.175. Std of Reward: 2.250. Training. +INFO:mlagents.trainers: first-run-0: Ball3DLearning: Step: 6000. Mean Reward: 4.898. Std of Reward: 4.019. Training. +INFO:mlagents.trainers: first-run-0: Ball3DLearning: Step: 7000. Mean Reward: 6.716. Std of Reward: 5.125. Training. +INFO:mlagents.trainers: first-run-0: Ball3DLearning: Step: 8000. Mean Reward: 12.124. Std of Reward: 11.929. Training. +INFO:mlagents.trainers: first-run-0: Ball3DLearning: Step: 9000. Mean Reward: 18.151. Std of Reward: 16.871. Training. +INFO:mlagents.trainers: first-run-0: Ball3DLearning: Step: 10000. Mean Reward: 27.284. Std of Reward: 28.667. Training. +``` + +You can press Ctrl+C to stop the training, and your trained model will be at +`results//.onnx`, which corresponds to your model's +latest checkpoint. (**Note:** There is a known bug on Windows that causes the +saving of the model to fail when you early terminate the training, it's +recommended to wait until Step has reached the max_steps parameter you set in +your config YAML.) You can now embed this trained model into your Agent by +following the steps below: + +1. Move your model file into + `Project/Assets/ML-Agents/Examples/3DBall/TFModels/`. +1. Open the Unity Editor, and select the **3DBall** scene as described above. +1. Select the **3DBall** prefab from the Project window and select **Agent**. +1. Drag the `.onnx` file from the Project window of the Editor to + the **Model** placeholder in the **Ball3DAgent** inspector window. +1. Press the **Play** button at the top of the Editor. + +## Training on Headless Server + +To run training on headless server with no graphics rendering support, you need to turn off +graphics display in the Unity executable. There are two ways to achieve this: +1. Pass `--no-graphics` option to mlagents-learn training command. This is equivalent to + adding `-nographics -batchmode` to the Unity executable's commandline. +2. Build your Unity executable with **Server Build**. You can find this setting in Build Settings + in the Unity Editor. + +If you want to train with graphics (for example, using camera and visual observations), you'll +need to set up display rendering support (e.g. xvfb) on you server machine. In our +[Colab Notebook Tutorials](ML-Agents-Toolkit-Documentation.md#python-tutorial-with-google-colab), the Setup section has +examples of setting up xvfb on servers. diff --git a/com.unity.ml-agents/Documentation~/Limitations.md b/com.unity.ml-agents/Documentation~/Limitations.md new file mode 100644 index 0000000000..4ef89af32e --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Limitations.md @@ -0,0 +1,7 @@ +# Limitations + +See the package-specific Limitations pages: + +- [`com.unity.mlagents` Unity package](https://docs.unity3d.com/Packages/com.unity.ml-agents@2.3/manual/index.html#known-limitations) +- [`mlagents` Python package](../ml-agents/README.md#limitations) +- [`mlagents_envs` Python package](../ml-agents-envs/README.md#limitations) diff --git a/com.unity.ml-agents/Documentation~/ML-Agents-Overview.md b/com.unity.ml-agents/Documentation~/ML-Agents-Overview.md new file mode 100644 index 0000000000..5337fd7199 --- /dev/null +++ b/com.unity.ml-agents/Documentation~/ML-Agents-Overview.md @@ -0,0 +1,820 @@ +# ML-Agents Toolkit Overview + +**Table of Contents** + +- [Running Example: Training NPC Behaviors](#running-example-training-npc-behaviors) +- [Key Components](#key-components) +- [Training Modes](#training-modes) + - [Built-in Training and Inference](#built-in-training-and-inference) + - [Cross-Platform Inference](#cross-platform-inference) + - [Custom Training and Inference](#custom-training-and-inference) +- [Flexible Training Scenarios](#flexible-training-scenarios) +- [Training Methods: Environment-agnostic](#training-methods-environment-agnostic) + - [A Quick Note on Reward Signals](#a-quick-note-on-reward-signals) + - [Deep Reinforcement Learning](#deep-reinforcement-learning) + - [Curiosity for Sparse-reward Environments](#curiosity-for-sparse-reward-environments) + - [RND for Sparse-reward Environments](#rnd-for-sparse-reward-environments) + - [Imitation Learning](#imitation-learning) + - [GAIL (Generative Adversarial Imitation Learning)](#gail-generative-adversarial-imitation-learning) + - [Behavioral Cloning (BC)](#behavioral-cloning-bc) + - [Recording Demonstrations](#recording-demonstrations) + - [Summary](#summary) +- [Training Methods: Environment-specific](#training-methods-environment-specific) + - [Training in Competitive Multi-Agent Environments with Self-Play](#training-in-competitive-multi-agent-environments-with-self-play) + - [Training in Cooperative Multi-Agent Environments with MA-POCA](#training-in-cooperative-multi-agent-environments-with-ma-poca) + - [Solving Complex Tasks using Curriculum Learning](#solving-complex-tasks-using-curriculum-learning) + - [Training Robust Agents using Environment Parameter Randomization](#training-robust-agents-using-environment-parameter-randomization) +- [Model Types](#model-types) + - [Learning from Vector Observations](#learning-from-vector-observations) + - [Learning from Cameras using Convolutional Neural Networks](#learning-from-cameras-using-convolutional-neural-networks) + - [Learning from Variable Length Observations using Attention](#learning-from-variable-length-observations-using-attention) + - [Memory-enhanced Agents using Recurrent Neural Networks](#memory-enhanced-agents-using-recurrent-neural-networks) +- [Additional Features](#additional-features) +- [Summary and Next Steps](#summary-and-next-steps) + +**The Unity Machine Learning Agents Toolkit** (ML-Agents Toolkit) is an +open-source project that enables games and simulations to serve as environments +for training intelligent agents. Agents can be trained using reinforcement +learning, imitation learning, neuroevolution, or other machine learning methods +through a simple-to-use Python API. We also provide implementations (based on +PyTorch) of state-of-the-art algorithms to enable game developers and +hobbyists to easily train intelligent agents for 2D, 3D and VR/AR games. These +trained agents can be used for multiple purposes, including controlling NPC +behavior (in a variety of settings such as multi-agent and adversarial), +automated testing of game builds and evaluating different game design decisions +pre-release. The ML-Agents Toolkit is mutually beneficial for both game +developers and AI researchers as it provides a central platform where advances +in AI can be evaluated on Unity’s rich environments and then made accessible to +the wider research and game developer communities. + +Depending on your background (i.e. researcher, game developer, hobbyist), you +may have very different questions on your mind at the moment. To make your +transition to the ML-Agents Toolkit easier, we provide several background pages +that include overviews and helpful resources on the +[Unity Engine](Background-Unity.md), +[machine learning](Background-Machine-Learning.md) and +[PyTorch](Background-PyTorch.md). We **strongly** recommend browsing the +relevant background pages if you're not familiar with a Unity scene, basic +machine learning concepts or have not previously heard of PyTorch. + +The remainder of this page contains a deep dive into ML-Agents, its key +components, different training modes and scenarios. By the end of it, you should +have a good sense of _what_ the ML-Agents Toolkit allows you to do. The +subsequent documentation pages provide examples of _how_ to use ML-Agents. To +get started, watch this +[demo video of ML-Agents in action](https://www.youtube.com/watch?v=fiQsmdwEGT8&feature=youtu.be). + +## Running Example: Training NPC Behaviors + +To help explain the material and terminology in this page, we'll use a +hypothetical, running example throughout. We will explore the problem of +training the behavior of a non-playable character (NPC) in a game. (An NPC is a +game character that is never controlled by a human player and its behavior is +pre-defined by the game developer.) More specifically, let's assume we're +building a multi-player, war-themed game in which players control the soldiers. +In this game, we have a single NPC who serves as a medic, finding and reviving +wounded players. Lastly, let us assume that there are two teams, each with five +players and one NPC medic. + +The behavior of a medic is quite complex. It first needs to avoid getting +injured, which requires detecting when it is in danger and moving to a safe +location. Second, it needs to be aware of which of its team members are injured +and require assistance. In the case of multiple injuries, it needs to assess the +degree of injury and decide who to help first. Lastly, a good medic will always +place itself in a position where it can quickly help its team members. Factoring +in all of these traits means that at every instance, the medic needs to measure +several attributes of the environment (e.g. position of team members, position +of enemies, which of its team members are injured and to what degree) and then +decide on an action (e.g. hide from enemy fire, move to help one of its +members). Given the large number of settings of the environment and the large +number of actions that the medic can take, defining and implementing such +complex behaviors by hand is challenging and prone to errors. + +With ML-Agents, it is possible to _train_ the behaviors of such NPCs (called +**Agents**) using a variety of methods. The basic idea is quite simple. We need +to define three entities at every moment of the game (called **environment**): + +- **Observations** - what the medic perceives about the environment. + Observations can be numeric and/or visual. Numeric observations measure + attributes of the environment from the point of view of the agent. For our + medic this would be attributes of the battlefield that are visible to it. For + most interesting environments, an agent will require several continuous + numeric observations. Visual observations, on the other hand, are images + generated from the cameras attached to the agent and represent what the agent + is seeing at that point in time. It is common to confuse an agent's + observation with the environment (or game) **state**. The environment state + represents information about the entire scene containing all the game + characters. The agents observation, however, only contains information that + the agent is aware of and is typically a subset of the environment state. For + example, the medic observation cannot include information about an enemy in + hiding that the medic is unaware of. +- **Actions** - what actions the medic can take. Similar to observations, + actions can either be continuous or discrete depending on the complexity of + the environment and agent. In the case of the medic, if the environment is a + simple grid world where only their location matters, then a discrete action + taking on one of four values (north, south, east, west) suffices. However, if + the environment is more complex and the medic can move freely then using two + continuous actions (one for direction and another for speed) is more + appropriate. +- **Reward signals** - a scalar value indicating how well the medic is doing. + Note that the reward signal need not be provided at every moment, but only + when the medic performs an action that is good or bad. For example, it can + receive a large negative reward if it dies, a modest positive reward whenever + it revives a wounded team member, and a modest negative reward when a wounded + team member dies due to lack of assistance. Note that the reward signal is how + the objectives of the task are communicated to the agent, so they need to be + set up in a manner where maximizing reward generates the desired optimal + behavior. + +After defining these three entities (the building blocks of a **reinforcement +learning task**), we can now _train_ the medic's behavior. This is achieved by +simulating the environment for many trials where the medic, over time, learns +what is the optimal action to take for every observation it measures by +maximizing its future reward. The key is that by learning the actions that +maximize its reward, the medic is learning the behaviors that make it a good +medic (i.e. one who saves the most number of lives). In **reinforcement +learning** terminology, the behavior that is learned is called a **policy**, +which is essentially a (optimal) mapping from observations to actions. Note that +the process of learning a policy through running simulations is called the +**training phase**, while playing the game with an NPC that is using its learned +policy is called the **inference phase**. + +The ML-Agents Toolkit provides all the necessary tools for using Unity as the +simulation engine for learning the policies of different objects in a Unity +environment. In the next few sections, we discuss how the ML-Agents Toolkit +achieves this and what features it provides. + +## Key Components + +The ML-Agents Toolkit contains five high-level components: + +- **Learning Environment** - which contains the Unity scene and all the game + characters. The Unity scene provides the environment in which agents observe, + act, and learn. How you set up the Unity scene to serve as a learning + environment really depends on your goal. You may be trying to solve a specific + reinforcement learning problem of limited scope, in which case you can use the + same scene for both training and for testing trained agents. Or, you may be + training agents to operate in a complex game or simulation. In this case, it + might be more efficient and practical to create a purpose-built training + scene. The ML-Agents Toolkit includes an ML-Agents Unity SDK + (`com.unity.ml-agents` package) that enables you to transform any Unity scene + into a learning environment by defining the agents and their behaviors. +- **Python Low-Level API** - which contains a low-level Python interface for + interacting and manipulating a learning environment. Note that, unlike the + Learning Environment, the Python API is not part of Unity, but lives outside + and communicates with Unity through the Communicator. This API is contained in + a dedicated `mlagents_envs` Python package and is used by the Python training + process to communicate with and control the Academy during training. However, + it can be used for other purposes as well. For example, you could use the API + to use Unity as the simulation engine for your own machine learning + algorithms. See [Python API](Python-LLAPI.md) for more information. +- **External Communicator** - which connects the Learning Environment with the + Python Low-Level API. It lives within the Learning Environment. +- **Python Trainers** which contains all the machine learning algorithms that + enable training agents. The algorithms are implemented in Python and are part + of their own `mlagents` Python package. The package exposes a single + command-line utility `mlagents-learn` that supports all the training methods + and options outlined in this document. The Python Trainers interface solely + with the Python Low-Level API. +- **Gym Wrapper** (not pictured). A common way in which machine learning + researchers interact with simulation environments is via a wrapper provided by + OpenAI called [gym](https://github.com/openai/gym). We provide a gym wrapper + in the `ml-agents-envs` package and [instructions](Python-Gym-API.md) for using + it with existing machine learning algorithms which utilize gym. +- **PettingZoo Wrapper** (not pictured) PettingZoo is python API for + interacting with multi-agent simulation environments that provides a + gym-like interface. We provide a PettingZoo wrapper for Unity ML-Agents + environments in the `ml-agents-envs` package and + [instructions](Python-PettingZoo-API.md) for using it with machine learning + algorithms. + +

+ Simplified ML-Agents Scene Block Diagram +

+ +_Simplified block diagram of ML-Agents._ + +The Learning Environment contains two Unity Components that help organize the +Unity scene: + +- **Agents** - which is attached to a Unity GameObject (any character within a + scene) and handles generating its observations, performing the actions it + receives and assigning a reward (positive / negative) when appropriate. Each + Agent is linked to a Behavior. +- **Behavior** - defines specific attributes of the agent such as the number of + actions that agent can take. Each Behavior is uniquely identified by a + `Behavior Name` field. A Behavior can be thought as a function that receives + observations and rewards from the Agent and returns actions. A Behavior can be + of one of three types: Learning, Heuristic or Inference. A Learning Behavior + is one that is not, yet, defined but about to be trained. A Heuristic Behavior + is one that is defined by a hard-coded set of rules implemented in code. An + Inference Behavior is one that includes a trained Neural Network file. In + essence, after a Learning Behavior is trained, it becomes an Inference + Behavior. + +Every Learning Environment will always have one Agent for every character in the +scene. While each Agent must be linked to a Behavior, it is possible for Agents +that have similar observations and actions to have the same Behavior. In our +sample game, we have two teams each with their own medic. Thus we will have two +Agents in our Learning Environment, one for each medic, but both of these medics +can have the same Behavior. This does not mean that at each instance they will +have identical observation and action _values_. + +

+ Example ML-Agents Scene Block Diagram +

+ +_Example block diagram of ML-Agents Toolkit for our sample game._ + +Note that in a single environment, there can be multiple Agents and multiple +Behaviors at the same time. For example, if we expanded our game to include tank +driver NPCs, then the Agent attached to those characters cannot share its +Behavior with the Agent linked to the medics (medics and drivers have different +actions). The Learning Environment through the Academy (not represented in the +diagram) ensures that all the Agents are in sync in addition to controlling +environment-wide settings. + +Lastly, it is possible to exchange data between Unity and Python outside of the +machine learning loop through _Side Channels_. One example of using _Side +Channels_ is to exchange data with Python about _Environment Parameters_. The +following diagram illustrates the above. + +

+ More Complete Example ML-Agents Scene Block Diagram +

+ +## Training Modes + +Given the flexibility of ML-Agents, there are a few ways in which training and +inference can proceed. + +### Built-in Training and Inference + +As mentioned previously, the ML-Agents Toolkit ships with several +implementations of state-of-the-art algorithms for training intelligent agents. +More specifically, during training, all the medics in the scene send their +observations to the Python API through the External Communicator. The Python API +processes these observations and sends back actions for each medic to take. +During training these actions are mostly exploratory to help the Python API +learn the best policy for each medic. Once training concludes, the learned +policy for each medic can be exported as a model file. Then during the inference +phase, the medics still continue to generate their observations, but instead of +being sent to the Python API, they will be fed into their (internal, embedded) +model to generate the _optimal_ action for each medic to take at every point in +time. + +The [Getting Started Guide](Getting-Started.md) tutorial covers this training +mode with the **3D Balance Ball** sample environment. + +#### Cross-Platform Inference + +It is important to note that the ML-Agents Toolkit leverages the +[Inference Engine](Inference-Engine.md) to run the models within a +Unity scene such that an agent can take the _optimal_ action at each step. Given +that Inference Engine supports all Unity runtime platforms, this +means that any model you train with the ML-Agents Toolkit can be embedded into +your Unity application that runs on any platform. See our +[dedicated blog post](https://blogs.unity3d.com/2019/03/01/unity-ml-agents-toolkit-v0-7-a-leap-towards-cross-platform-inference/) +for additional information. + +### Custom Training and Inference + +In the previous mode, the Agents were used for training to generate a PyTorch +model that the Agents can later use. However, any user of the ML-Agents Toolkit +can leverage their own algorithms for training. In this case, the behaviors of +all the Agents in the scene will be controlled within Python. You can even turn +your environment into a [gym.](Python-Gym-API.md) + +We do not currently have a tutorial highlighting this mode, but you can learn +more about the Python API [here](Python-LLAPI.md). + +## Flexible Training Scenarios + +While the discussion so-far has mostly focused on training a single agent, with +ML-Agents, several training scenarios are possible. We are excited to see what +kinds of novel and fun environments the community creates. For those new to +training intelligent agents, below are a few examples that can serve as +inspiration: + +- Single-Agent. A single agent, with its own reward signal. The traditional way + of training an agent. An example is any single-player game, such as Chicken. +- Simultaneous Single-Agent. Multiple independent agents with independent reward + signals with same `Behavior Parameters`. A parallelized version of the + traditional training scenario, which can speed-up and stabilize the training + process. Helpful when you have multiple versions of the same character in an + environment who should learn similar behaviors. An example might be training a + dozen robot-arms to each open a door simultaneously. +- Adversarial Self-Play. Two interacting agents with inverse reward signals. In + two-player games, adversarial self-play can allow an agent to become + increasingly more skilled, while always having the perfectly matched opponent: + itself. This was the strategy employed when training AlphaGo, and more + recently used by OpenAI to train a human-beating 1-vs-1 Dota 2 agent. +- Cooperative Multi-Agent. Multiple interacting agents with a shared reward + signal with same or different `Behavior Parameters`. In this scenario, all + agents must work together to accomplish a task that cannot be done alone. + Examples include environments where each agent only has access to partial + information, which needs to be shared in order to accomplish the task or + collaboratively solve a puzzle. +- Competitive Multi-Agent. Multiple interacting agents with inverse reward + signals with same or different `Behavior Parameters`. In this scenario, agents + must compete with one another to either win a competition, or obtain some + limited set of resources. All team sports fall into this scenario. +- Ecosystem. Multiple interacting agents with independent reward signals with + same or different `Behavior Parameters`. This scenario can be thought of as + creating a small world in which animals with different goals all interact, + such as a savanna in which there might be zebras, elephants and giraffes, or + an autonomous driving simulation within an urban environment. + +## Training Methods: Environment-agnostic + +The remaining sections overview the various state-of-the-art machine learning +algorithms that are part of the ML-Agents Toolkit. If you aren't studying +machine and reinforcement learning as a subject and just want to train agents to +accomplish tasks, you can treat these algorithms as _black boxes_. There are a +few training-related parameters to adjust inside Unity as well as on the Python +training side, but you do not need in-depth knowledge of the algorithms +themselves to successfully create and train agents. Step-by-step procedures for +running the training process are provided in the +[Training ML-Agents](Training-ML-Agents.md) page. + +This section specifically focuses on the training methods that are available +regardless of the specifics of your learning environment. + +#### A Quick Note on Reward Signals + +In this section we introduce the concepts of _intrinsic_ and _extrinsic_ +rewards, which helps explain some of the training methods. + +In reinforcement learning, the end goal for the Agent is to discover a behavior +(a Policy) that maximizes a reward. You will need to provide the agent one or +more reward signals to use during training. Typically, a reward is defined by +your environment, and corresponds to reaching some goal. These are what we refer +to as _extrinsic_ rewards, as they are defined external of the learning +algorithm. + +Rewards, however, can be defined outside of the environment as well, to +encourage the agent to behave in certain ways, or to aid the learning of the +true extrinsic reward. We refer to these rewards as _intrinsic_ reward signals. +The total reward that the agent will learn to maximize can be a mix of extrinsic +and intrinsic reward signals. + +The ML-Agents Toolkit allows reward signals to be defined in a modular way, and +we provide four reward signals that can the mixed and matched to help shape +your agent's behavior: + +- `extrinsic`: represents the rewards defined in your environment, and is + enabled by default +- `gail`: represents an intrinsic reward signal that is defined by GAIL (see + below) +- `curiosity`: represents an intrinsic reward signal that encourages exploration + in sparse-reward environments that is defined by the Curiosity module (see + below). +- `rnd`: represents an intrinsic reward signal that encourages exploration + in sparse-reward environments that is defined by the Curiosity module (see + below). + +### Deep Reinforcement Learning + +ML-Agents provide an implementation of two reinforcement learning algorithms: + +- [Proximal Policy Optimization (PPO)](https://openai.com/research/openai-baselines-ppo) +- [Soft Actor-Critic (SAC)](https://bair.berkeley.edu/blog/2018/12/14/sac/) + +The default algorithm is PPO. This is a method that has been shown to be more +general purpose and stable than many other RL algorithms. + +In contrast with PPO, SAC is _off-policy_, which means it can learn from +experiences collected at any time during the past. As experiences are collected, +they are placed in an experience replay buffer and randomly drawn during +training. This makes SAC significantly more sample-efficient, often requiring +5-10 times less samples to learn the same task as PPO. However, SAC tends to +require more model updates. SAC is a good choice for heavier or slower +environments (about 0.1 seconds per step or more). SAC is also a "maximum +entropy" algorithm, and enables exploration in an intrinsic way. Read more about +maximum entropy RL +[here](https://bair.berkeley.edu/blog/2017/10/06/soft-q-learning/). + +#### Curiosity for Sparse-reward Environments + +In environments where the agent receives rare or infrequent rewards (i.e. +sparse-reward), an agent may never receive a reward signal on which to bootstrap +its training process. This is a scenario where the use of an intrinsic reward +signals can be valuable. Curiosity is one such signal which can help the agent +explore when extrinsic rewards are sparse. + +The `curiosity` Reward Signal enables the Intrinsic Curiosity Module. This is an +implementation of the approach described in +[Curiosity-driven Exploration by Self-supervised Prediction](https://pathak22.github.io/noreward-rl/) +by Pathak, et al. It trains two networks: + +- an inverse model, which takes the current and next observation of the agent, + encodes them, and uses the encoding to predict the action that was taken + between the observations +- a forward model, which takes the encoded current observation and action, and + predicts the next encoded observation. + +The loss of the forward model (the difference between the predicted and actual +encoded observations) is used as the intrinsic reward, so the more surprised the +model is, the larger the reward will be. + +For more information, see our dedicated +[blog post on the Curiosity module](https://blogs.unity3d.com/2018/06/26/solving-sparse-reward-tasks-with-curiosity/). + +#### RND for Sparse-reward Environments + +Similarly to Curiosity, Random Network Distillation (RND) is useful in sparse or rare +reward environments as it helps the Agent explore. The RND Module is implemented following +the paper [Exploration by Random Network Distillation](https://arxiv.org/abs/1810.12894). +RND uses two networks: + + - The first is a network with fixed random weights that takes observations as inputs and + generates an encoding + - The second is a network with similar architecture that is trained to predict the + outputs of the first network and uses the observations the Agent collects as training data. + +The loss (the squared difference between the predicted and actual encoded observations) +of the trained model is used as intrinsic reward. The more an Agent visits a state, the +more accurate the predictions and the lower the rewards which encourages the Agent to +explore new states with higher prediction errors. + +### Imitation Learning + +It is often more intuitive to simply demonstrate the behavior we want an agent +to perform, rather than attempting to have it learn via trial-and-error methods. +For example, instead of indirectly training a medic with the help of a reward +function, we can give the medic real world examples of observations from the +game and actions from a game controller to guide the medic's behavior. Imitation +Learning uses pairs of observations and actions from a demonstration to learn a +policy. See this [video demo](https://youtu.be/kpb8ZkMBFYs) of imitation +learning . + +Imitation learning can either be used alone or in conjunction with reinforcement +learning. If used alone it can provide a mechanism for learning a specific type +of behavior (i.e. a specific style of solving the task). If used in conjunction +with reinforcement learning it can dramatically reduce the time the agent takes +to solve the environment. This can be especially pronounced in sparse-reward +environments. For instance, on the +[Pyramids environment](Learning-Environment-Examples.md#pyramids), using 6 +episodes of demonstrations can reduce training steps by more than 4 times. See +Behavioral Cloning + GAIL + Curiosity + RL below. + +

+ Using Demonstrations with Reinforcement Learning +

+ +The ML-Agents Toolkit provides a way to learn directly from demonstrations, as +well as use them to help speed up reward-based training (RL). We include two +algorithms called Behavioral Cloning (BC) and Generative Adversarial Imitation +Learning (GAIL). In most scenarios, you can combine these two features: + +- If you want to help your agents learn (especially with environments that have + sparse rewards) using pre-recorded demonstrations, you can generally enable + both GAIL and Behavioral Cloning at low strengths in addition to having an + extrinsic reward. An example of this is provided for the PushBlock example + environment in `config/imitation/PushBlock.yaml`. +- If you want to train purely from demonstrations with GAIL and BC _without_ an + extrinsic reward signal, please see the CrawlerStatic example environment under + in `config/imitation/CrawlerStatic.yaml`. + +***Note:*** GAIL introduces a [_survivor bias_](https://arxiv.org/pdf/1809.02925.pdf) +to the learning process. That is, by giving positive rewards based on similarity +to the expert, the agent is incentivized to remain alive for as long as possible. +This can directly conflict with goal-oriented tasks like our PushBlock or Pyramids +example environments where an agent must reach a goal state thus ending the +episode as quickly as possible. In these cases, we strongly recommend that you +use a low strength GAIL reward signal and a sparse extrinsic signal when +the agent achieves the task. This way, the GAIL reward signal will guide the +agent until it discovers the extrinsic signal and will not overpower it. If the +agent appears to be ignoring the extrinsic reward signal, you should reduce +the strength of GAIL. + +#### GAIL (Generative Adversarial Imitation Learning) + +GAIL, or +[Generative Adversarial Imitation Learning](https://arxiv.org/abs/1606.03476), +uses an adversarial approach to reward your Agent for behaving similar to a set +of demonstrations. GAIL can be used with or without environment rewards, and +works well when there are a limited number of demonstrations. In this framework, +a second neural network, the discriminator, is taught to distinguish whether an +observation/action is from a demonstration or produced by the agent. This +discriminator can then examine a new observation/action and provide it a reward +based on how close it believes this new observation/action is to the provided +demonstrations. + +At each training step, the agent tries to learn how to maximize this reward. +Then, the discriminator is trained to better distinguish between demonstrations +and agent state/actions. In this way, while the agent gets better and better at +mimicking the demonstrations, the discriminator keeps getting stricter and +stricter and the agent must try harder to "fool" it. + +This approach learns a _policy_ that produces states and actions similar to the +demonstrations, requiring fewer demonstrations than direct cloning of the +actions. In addition to learning purely from demonstrations, the GAIL reward +signal can be mixed with an extrinsic reward signal to guide the learning +process. + +#### Behavioral Cloning (BC) + +BC trains the Agent's policy to exactly mimic the actions shown in a set of +demonstrations. The BC feature can be enabled on the PPO or SAC trainers. As BC +cannot generalize past the examples shown in the demonstrations, BC tends to +work best when there exists demonstrations for nearly all of the states that the +agent can experience, or in conjunction with GAIL and/or an extrinsic reward. + +#### Recording Demonstrations + +Demonstrations of agent behavior can be recorded from the Unity Editor or build, +and saved as assets. These demonstrations contain information on the +observations, actions, and rewards for a given agent during the recording +session. They can be managed in the Editor, as well as used for training with BC +and GAIL. See the +[Designing Agents](Learning-Environment-Design-Agents.md#recording-demonstrations) +page for more information on how to record demonstrations for your agent. + +### Summary + +To summarize, we provide 3 training methods: BC, GAIL and RL (PPO or SAC) that +can be used independently or together: + +- BC can be used on its own or as a pre-training step before GAIL and/or RL +- GAIL can be used with or without extrinsic rewards +- RL can be used on its own (either PPO or SAC) or in conjunction with BC and/or + GAIL. + +Leveraging either BC or GAIL requires recording demonstrations to be provided as +input to the training algorithms. + +## Training Methods: Environment-specific + +In addition to the three environment-agnostic training methods introduced in the +previous section, the ML-Agents Toolkit provides additional methods that can aid +in training behaviors for specific types of environments. + +### Training in Competitive Multi-Agent Environments with Self-Play + +ML-Agents provides the functionality to train both symmetric and asymmetric +adversarial games with +[Self-Play](https://openai.com/research/competitive-self-play). A symmetric game is +one in which opposing agents are equal in form, function and objective. Examples +of symmetric games are our Tennis and Soccer example environments. In +reinforcement learning, this means both agents have the same observation and +actions and learn from the same reward function and so _they can share the +same policy_. In asymmetric games, this is not the case. An example of an +asymmetric games are Hide and Seek. Agents in these types of games do not always +have the same observation or actions and so sharing policy networks is not +necessarily ideal. + +With self-play, an agent learns in adversarial games by competing against fixed, +past versions of its opponent (which could be itself as in symmetric games) to +provide a more stable, stationary learning environment. This is compared to +competing against the current, best opponent in every episode, which is +constantly changing (because it's learning). + +Self-play can be used with our implementations of both Proximal Policy +Optimization (PPO) and Soft Actor-Critic (SAC). However, from the perspective of +an individual agent, these scenarios appear to have non-stationary dynamics +because the opponent is often changing. This can cause significant issues in the +experience replay mechanism used by SAC. Thus, we recommend that users use PPO. +For further reading on this issue in particular, see the paper +[Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning](https://arxiv.org/pdf/1702.08887.pdf). + +See our +[Designing Agents](Learning-Environment-Design-Agents.md#defining-teams-for-multi-agent-scenarios) +page for more information on setting up teams in your Unity scene. Also, read +our +[blog post on self-play](https://blogs.unity3d.com/2020/02/28/training-intelligent-adversaries-using-self-play-with-ml-agents/) +for additional information. Additionally, check [ELO Rating System](ELO-Rating-System.md) the method we use to calculate +the relative skill level between two players. + +### Training In Cooperative Multi-Agent Environments with MA-POCA + +![PushBlock with Agents Working Together](images/cooperative_pushblock.png) + +ML-Agents provides the functionality for training cooperative behaviors - i.e., +groups of agents working towards a common goal, where the success of the individual +is linked to the success of the whole group. In such a scenario, agents typically receive +rewards as a group. For instance, if a team of agents wins a game against an opposing +team, everyone is rewarded - even agents who did not directly contribute to the win. This +makes learning what to do as an individual difficult - you may get a win +for doing nothing, and a loss for doing your best. + +In ML-Agents, we provide MA-POCA (MultiAgent POsthumous Credit Assignment), which +is a novel multi-agent trainer that trains a _centralized critic_, a neural network +that acts as a "coach" for a whole group of agents. You can then give rewards to the team +as a whole, and the agents will learn how best to contribute to achieving that reward. +Agents can _also_ be given rewards individually, and the team will work together to help the +individual achieve those goals. During an episode, agents can be added or removed from the group, +such as when agents spawn or die in a game. If agents are removed mid-episode (e.g., if teammates die +or are removed from the game), they will still learn whether their actions contributed +to the team winning later, enabling agents to take group-beneficial actions even if +they result in the individual being removed from the game (i.e., self-sacrifice). +MA-POCA can also be combined with self-play to train teams of agents to play against each other. + +To learn more about enabling cooperative behaviors for agents in an ML-Agents environment, +check out [this page](Learning-Environment-Design-Agents.md#groups-for-cooperative-scenarios). + +To learn more about MA-POCA, please see our paper +[On the Use and Misuse of Absorbing States in Multi-Agent Reinforcement Learning](https://arxiv.org/pdf/2111.05992.pdf). +For further reading, MA-POCA builds on previous work in multi-agent cooperative learning +([Lowe et al.](https://arxiv.org/abs/1706.02275), [Foerster et al.](https://arxiv.org/pdf/1705.08926.pdf), +among others) to enable the above use-cases. + +### Solving Complex Tasks using Curriculum Learning + +Curriculum learning is a way of training a machine learning model where more +difficult aspects of a problem are gradually introduced in such a way that the +model is always optimally challenged. This idea has been around for a long time, +and it is how we humans typically learn. If you imagine any childhood primary +school education, there is an ordering of classes and topics. Arithmetic is +taught before algebra, for example. Likewise, algebra is taught before calculus. +The skills and knowledge learned in the earlier subjects provide a scaffolding +for later lessons. The same principle can be applied to machine learning, where +training on easier tasks can provide a scaffolding for harder tasks in the +future. + +Imagine training the medic to scale a wall to arrive at a wounded team +member. The starting point when training a medic to accomplish this task will be +a random policy. That starting policy will have the medic running in circles, +and will likely never, or very rarely scale the wall properly to revive their +team member (and achieve the reward). If we start with a simpler task, such as +moving toward an unobstructed team member, then the medic can easily learn to +accomplish the task. From there, we can slowly add to the difficulty of the task +by increasing the size of the wall until the medic can complete the initially +near-impossible task of scaling the wall. We have included an environment to +demonstrate this with ML-Agents, called +[Wall Jump](Learning-Environment-Examples.md#wall-jump). + +![Wall](images/curriculum.png) + +_Demonstration of a hypothetical curriculum training scenario in which a +progressively taller wall obstructs the path to the goal._ + +_[**Note**: The example provided above is for instructional purposes, and was +based on an early version of the +[Wall Jump example environment](Learning-Environment-Examples.md). As such, it +is not possible to directly replicate the results here using that environment.]_ + +The ML-Agents Toolkit supports modifying custom environment parameters during +the training process to aid in learning. This allows elements of the environment +related to difficulty or complexity to be dynamically adjusted based on training +progress. The [Training ML-Agents](Training-ML-Agents.md#curriculum-learning) +page has more information on defining training curriculums. + +### Training Robust Agents using Environment Parameter Randomization + +An agent trained on a specific environment, may be unable to generalize to any +tweaks or variations in the environment (in machine learning this is referred to +as overfitting). This becomes problematic in cases where environments are +instantiated with varying objects or properties. One mechanism to alleviate this +and train more robust agents that can generalize to unseen variations of the +environment is to expose them to these variations during training. Similar to +Curriculum Learning, where environments become more difficult as the agent +learns, the ML-Agents Toolkit provides a way to randomly sample parameters of +the environment during training. We refer to this approach as **Environment +Parameter Randomization**. For those familiar with Reinforcement Learning +research, this approach is based on the concept of +[Domain Randomization](https://arxiv.org/abs/1703.06907). By using +[parameter randomization during training](Training-ML-Agents.md#environment-parameter-randomization), +the agent can be better suited to adapt (with higher performance) to future +unseen variations of the environment. + +| Ball scale of 0.5 | Ball scale of 4 | +| :--------------------------: | :------------------------: | +| ![](images/3dball_small.png) | ![](images/3dball_big.png) | + +_Example of variations of the 3D Ball environment. The environment parameters +are `gravity`, `ball_mass` and `ball_scale`._ + +## Model Types + +Regardless of the training method deployed, there are a few model types that +users can train using the ML-Agents Toolkit. This is due to the flexibility in +defining agent observations, which include vector, ray cast and visual +observations. You can learn more about how to instrument an agent's observation +in the [Designing Agents](Learning-Environment-Design-Agents.md) guide. + +### Learning from Vector Observations + +Whether an agent's observations are ray cast or vector, the ML-Agents Toolkit +provides a fully connected neural network model to learn from those +observations. At training time you can configure different aspects of this model +such as the number of hidden units and number of layers. + +### Learning from Cameras using Convolutional Neural Networks + +Unlike other platforms, where the agent’s observation might be limited to a +single vector or image, the ML-Agents Toolkit allows multiple cameras to be used +for observations per agent. This enables agents to learn to integrate +information from multiple visual streams. This can be helpful in several +scenarios such as training a self-driving car which requires multiple cameras +with different viewpoints, or a navigational agent which might need to integrate +aerial and first-person visuals. You can learn more about adding visual +observations to an agent +[here](Learning-Environment-Design-Agents.md#multiple-visual-observations). + +When visual observations are utilized, the ML-Agents Toolkit leverages +convolutional neural networks (CNN) to learn from the input images. We offer +three network architectures: + +- a simple encoder which consists of two convolutional layers +- the implementation proposed by + [Mnih et al.](https://www.nature.com/articles/nature14236), consisting of + three convolutional layers, +- the [IMPALA Resnet](https://arxiv.org/abs/1802.01561) consisting of three + stacked layers, each with two residual blocks, making a much larger network + than the other two. + +The choice of the architecture depends on the visual complexity of the scene and +the available computational resources. + +### Learning from Variable Length Observations using Attention + +Using the ML-Agents Toolkit, it is possible to have agents learn from a +varying number of inputs. To do so, each agent can keep track of a buffer +of vector observations. At each step, the agent will go through all the +elements in the buffer and extract information but the elements +in the buffer can change at every step. +This can be useful in scenarios in which the agents must keep track of +a varying number of elements throughout the episode. For example in a game +where an agent must learn to avoid projectiles, but the projectiles can vary in +numbers. + +![Variable Length Observations Illustrated](images/variable-length-observation-illustrated.png) + +You can learn more about variable length observations +[here](Learning-Environment-Design-Agents.md#variable-length-observations). +When variable length observations are utilized, the ML-Agents Toolkit +leverages attention networks to learn from a varying number of entities. +Agents using attention will ignore entities that are deemed not relevant +and pay special attention to entities relevant to the current situation +based on context. + +### Memory-enhanced Agents using Recurrent Neural Networks + +Have you ever entered a room to get something and immediately forgot what you +were looking for? Don't let that happen to your agents. + +![Inspector](images/ml-agents-LSTM.png) + +In some scenarios, agents must learn to remember the past in order to take the +best decision. When an agent only has partial observability of the environment, +keeping track of past observations can help the agent learn. Deciding what the +agents should remember in order to solve a task is not easy to do by hand, but +our training algorithms can learn to keep track of what is important to remember +with [LSTM](https://en.wikipedia.org/wiki/Long_short-term_memory). + +## Additional Features + +Beyond the flexible training scenarios available, the ML-Agents Toolkit includes +additional features which improve the flexibility and interpretability of the +training process. + +- **Concurrent Unity Instances** - We enable developers to run concurrent, + parallel instances of the Unity executable during training. For certain + scenarios, this should speed up training. Check out our dedicated page on + [creating a Unity executable](Learning-Environment-Executable.md) and the + [Training ML-Agents](Training-ML-Agents.md#training-using-concurrent-unity-instances) + page for instructions on how to set the number of concurrent instances. +- **Recording Statistics from Unity** - We enable developers to + [record statistics](Learning-Environment-Design.md#recording-statistics) from + within their Unity environments. These statistics are aggregated and generated + during the training process. +- **Custom Side Channels** - We enable developers to + [create custom side channels](Custom-SideChannels.md) to manage data transfer + between Unity and Python that is unique to their training workflow and/or + environment. +- **Custom Samplers** - We enable developers to + [create custom sampling methods](Training-ML-Agents.md#defining-a-new-sampler-type) + for Environment Parameter Randomization. This enables users to customize this + training method for their particular environment. + +## Summary and Next Steps + +To briefly summarize: The ML-Agents Toolkit enables games and simulations built +in Unity to serve as the platform for training intelligent agents. It is +designed to enable a large variety of training modes and scenarios and comes +packed with several features to enable researchers and developers to leverage +(and enhance) machine learning within Unity. + +In terms of next steps: + +- For a walkthrough of running ML-Agents with a simple scene, check out the + [Getting Started](Getting-Started.md) guide. +- For a "Hello World" introduction to creating your own Learning Environment, + check out the + [Making a New Learning Environment](Learning-Environment-Create-New.md) page. +- For an overview on the more complex example environments that are provided in + this toolkit, check out the + [Example Environments](Learning-Environment-Examples.md) page. +- For more information on the various training options available, check out the + [Training ML-Agents](Training-ML-Agents.md) page. diff --git a/com.unity.ml-agents/Documentation~/ML-Agents-Toolkit-Documentation.md b/com.unity.ml-agents/Documentation~/ML-Agents-Toolkit-Documentation.md new file mode 100644 index 0000000000..0a1226ca91 --- /dev/null +++ b/com.unity.ml-agents/Documentation~/ML-Agents-Toolkit-Documentation.md @@ -0,0 +1,84 @@ +# Unity ML-Agents Toolkit Documentation + +## Installation & Set-up + +- [Installation](Installation.md) + - [Using Virtual Environment](Using-Virtual-Environment.md) + +## Getting Started + +- [Getting Started Guide](Getting-Started.md) +- [ML-Agents Toolkit Overview](ML-Agents-Overview.md) + - [Background: Unity](Background-Unity.md) + - [Background: Machine Learning](Background-Machine-Learning.md) + - [Background: PyTorch](Background-PyTorch.md) +- [Example Environments](Learning-Environment-Examples.md) + +## Creating Learning Environments + +- [Making a New Learning Environment](Learning-Environment-Create-New.md) +- [Designing a Learning Environment](Learning-Environment-Design.md) + - [Designing Agents](Learning-Environment-Design-Agents.md) +- [Using an Executable Environment](Learning-Environment-Executable.md) +- [ML-Agents Package Settings](Package-Settings.md) + +## Training & Inference + +- [Training ML-Agents](Training-ML-Agents.md) + - [Training Configuration File](Training-Configuration-File.md) + - [Using TensorBoard to Observe Training](Using-Tensorboard.md) + - [Profiling Trainers](Profiling-Python.md) +- [Inference Engine](Inference-Engine.md) + +## Extending ML-Agents + +- [Creating Custom Side Channels](Custom-SideChannels.md) +- [Creating Custom Samplers for Environment Parameter Randomization](Training-ML-Agents.md#defining-a-new-sampler-type) + +## Hugging Face Integration + +- [Using Hugging Face to download and upload trained models](Hugging-Face-Integration.md) + +## Python Tutorial with Google Colab + +- [Using a UnityEnvironment](https://colab.research.google.com/github/Unity-Technologies/ml-agents/blob/release_22_docs/colab/Colab_UnityEnvironment_1_Run.ipynb) +- [Q-Learning with a UnityEnvironment](https://colab.research.google.com/github/Unity-Technologies/ml-agents/blob/release_22_docs/colab/Colab_UnityEnvironment_2_Train.ipynb) +- [Using Side Channels on a UnityEnvironment](https://colab.research.google.com/github/Unity-Technologies/ml-agents/blob/release_22_docs/colab/Colab_UnityEnvironment_3_SideChannel.ipynb) + +## Help + +- [Migrating from earlier versions of ML-Agents](Migrating.md) +- [Frequently Asked Questions](FAQ.md) +- [ML-Agents Glossary](Glossary.md) +- [Limitations](Limitations.md) + +## API Docs + +- [API Reference](API-Reference.md) +- [Python API Documentation](Python-LLAPI-Documentation.md) +- [How to use the Python API](Python-LLAPI.md) +- [How to use the Unity Environment Registry](Unity-Environment-Registry.md) +- [Wrapping Learning Environment as a Gym (+Baselines/Dopamine Integration)](Python-Gym-API.md) + +## Translations + +To make the Unity ML-Agents Toolkit accessible to the global research and Unity +developer communities, we're attempting to create and maintain translations of +our documentation. We've started with translating a subset of the documentation +to one language (Chinese), but we hope to continue translating more pages and to +other languages. Consequently, we welcome any enhancements and improvements from +the community. + +- [Chinese](../localized_docs/zh-CN/) +- [Korean](../localized_docs/KR/) + +## Deprecated Docs + +We no longer use them ourselves and so they may not be up-to-date. We've decided +to keep them up just in case they are helpful to you. + +- [Windows Anaconda Installation](Installation-Anaconda-Windows.md) +- [Using Docker](Using-Docker.md) +- [Training on the Cloud with Amazon Web Services](Training-on-Amazon-Web-Service.md) +- [Training on the Cloud with Microsoft Azure](Training-on-Microsoft-Azure.md) +- [Using the Video Recorder](https://github.com/Unity-Technologies/video-recorder) diff --git a/com.unity.ml-agents/Documentation~/Migrating.md b/com.unity.ml-agents/Documentation~/Migrating.md new file mode 100644 index 0000000000..b69276130a --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Migrating.md @@ -0,0 +1,886 @@ +# Upgrading + +# Migrating + +## Migrating to the ml-agents-envs 0.30.0 package +- Python 3.10.12 is now the minimum version of python supported due to [python3.6 EOL](https://endoflife.date/python). + Please update your python installation to 3.10.12 or higher. +- The `gym-unity` package has been refactored into the `ml-agents-envs` package. Please update your imports accordingly. +- Example: + - Before + ```python + from gym_unity.unity_gym_env import UnityToGymWrapper + ``` + - After: + ```python + from mlagents_envs.envs.unity_gym_env import UnityToGymWrapper + ``` + +## Migrating the package to version 3.x +- The official version of Unity ML-Agents supports is now 6000.0. If you run + into issues, please consider deleting your project's Library folder and reopening your + project. + + +## Migrating the package to version 2.x +- The official version of Unity ML-Agents supports is now 2022.3 LTS. If you run + into issues, please consider deleting your project's Library folder and reopening your + project. +- If you used any of the APIs that were deprecated before version 2.0, you need to use their replacement. These +deprecated APIs have been removed. See the migration steps bellow for specific API replacements. + +### Deprecated methods removed +| **Deprecated API** | **Suggested Replacement** | +|:-------:|:------:| +| `IActuator ActuatorComponent.CreateActuator()` | `IActuator[] ActuatorComponent.CreateActuators()` | +| `IActionReceiver.PackActions(in float[] destination)` | none | +| `Agent.CollectDiscreteActionMasks(DiscreteActionMasker actionMasker)` | `Agent.WriteDiscreteActionMask(IDiscreteActionMask actionMask)` | +| `Agent.Heuristic(float[] actionsOut)` | `Agent.Heuristic(in ActionBuffers actionsOut)` | +| `Agent.OnActionReceived(float[] vectorAction)` | `Agent.OnActionReceived(ActionBuffers actions)` | +| `Agent.GetAction()` | `Agent.GetStoredActionBuffers()` | +| `BrainParameters.SpaceType`, `VectorActionSize`, `VectorActionSpaceType`, and `NumActions` | `BrainParameters.ActionSpec` | +| `ObservationWriter.AddRange(IEnumerable data, int writeOffset = 0)` | `ObservationWriter. AddList(IList data, int writeOffset = 0` | +| `SensorComponent.IsVisual()` and `IsVector()` | none | +| `VectorSensor.AddObservation(IEnumerable observation)` | `VectorSensor.AddObservation(IList observation)` | +| `SideChannelsManager` | `SideChannelManager` | + +### IDiscreteActionMask changes +- The interface for disabling specific discrete actions has changed. `IDiscreteActionMask.WriteMask()` was removed, +and replaced with `SetActionEnabled()`. Instead of returning an IEnumerable with indices to disable, you can +now call `SetActionEnabled` for each index to disable (or enable). As an example, if you overrode +`Agent.WriteDiscreteActionMask()` with something that looked like: + +```csharp +public override void WriteDiscreteActionMask(IDiscreteActionMask actionMask) +{ + var branch = 2; + var actionsToDisable = new[] {1, 3}; + actionMask.WriteMask(branch, actionsToDisable); +} +``` + +the equivalent code would now be + +```csharp +public override void WriteDiscreteActionMask(IDiscreteActionMask actionMask) +{ + var branch = 2; + actionMask.SetActionEnabled(branch, 1, false); + actionMask.SetActionEnabled(branch, 3, false); +} +``` +### IActuator changes +- The `IActuator` interface now implements `IHeuristicProvider`. Please add the corresponding `Heuristic(in ActionBuffers)` +method to your custom Actuator classes. + +### ISensor and SensorComponent changes +- The `ISensor.GetObservationShape()` method and `ITypedSensor` +and `IDimensionPropertiesSensor` interfaces were removed, and `GetObservationSpec()` was added. You can use +`ObservationSpec.Vector()` or `ObservationSpec.Visual()` to generate `ObservationSpec`s that are equivalent to +the previous shape. For example, if your old ISensor looked like: + +```csharp +public override int[] GetObservationShape() +{ + return new[] { m_Height, m_Width, m_NumChannels }; +} +``` + +the equivalent code would now be + +```csharp +public override ObservationSpec GetObservationSpec() +{ + return ObservationSpec.Visual(m_Height, m_Width, m_NumChannels); +} +``` + +- The `ISensor.GetCompressionType()` method and `ISparseChannelSensor` interface was removed, +and `GetCompressionSpec()` was added. You can use `CompressionSpec.Default()` or +`CompressionSpec.Compressed()` to generate `CompressionSpec`s that are equivalent to + the previous values. For example, if your old ISensor looked like: + ```csharp +public virtual SensorCompressionType GetCompressionType() +{ + return SensorCompressionType.None; +} +``` + +the equivalent code would now be + +```csharp +public CompressionSpec GetCompressionSpec() +{ + return CompressionSpec.Default(); +} +``` + +- The abstract method `SensorComponent.GetObservationShape()` was removed. +- The abstract method `SensorComponent.CreateSensor()` was replaced with `CreateSensors()`, which returns an `ISensor[]`. + +### Match3 integration changes +The Match-3 integration utilities are now included in `com.unity.ml-agents`. + +The `AbstractBoard` interface was changed: +* `AbstractBoard` no longer contains `Rows`, `Columns`, `NumCellTypes`, and `NumSpecialTypes` fields. +* `public abstract BoardSize GetMaxBoardSize()` was added as an abstract method. `BoardSize` is a new struct that +contains `Rows`, `Columns`, `NumCellTypes`, and `NumSpecialTypes` fields, with the same meanings as the old +`AbstractBoard` fields. +* `public virtual BoardSize GetCurrentBoardSize()` is an optional method; by default it returns `GetMaxBoardSize()`. If +you wish to use a single behavior to work with multiple board sizes, override `GetCurrentBoardSize()` to return the +current `BoardSize`. The values returned by `GetCurrentBoardSize()` must be less than or equal to the corresponding +values from `GetMaxBoardSize()`. + +### GridSensor changes +The sensor configuration has changed: +* The sensor implementation has been refactored and existing GridSensor created from extension package +will not work in newer version. Some errors might show up when loading the old sensor in the scene. +You'll need to remove the old sensor and create a new GridSensor. +* These parameters names have changed but still refer to the same concept in the sensor: `GridNumSide` -> `GridSize`, +`RotateToAgent` -> `RotateWithAgent`, `ObserveMask` -> `ColliderMask`, `DetectableObjects` -> `DetectableTags` +* `DepthType` (`ChanelBase`/`ChannelHot`) option and `ChannelDepth` are removed. Now the default is +one-hot encoding for detected tag. If you were using original GridSensor without overriding any method, +switching to new GridSensor will produce similar effect for training although the actual observations +will be slightly different. + +For creating your GridSensor implementation with custom data: +* To create custom GridSensor, derive from `GridSensorBase` instead of `GridSensor`. Besides overriding +`GetObjectData()`, you will also need to consider override `GetCellObservationSize()`, `IsDataNormalized()` +and `GetProcessCollidersMethod()` according to the data you collect. Also you'll need to override +`GridSensorComponent.GetGridSensors()` and return your custom GridSensor. +* The input argument `tagIndex` in `GetObjectData()` has changed from 1-indexed to 0-indexed and the +data type changed from `float` to `int`. The index of first detectable tag will be 0 instead of 1. +`normalizedDistance` was removed from input. +* The observation data should be written to the input `dataBuffer` instead of creating and returning a new array. +* Removed the constraint of all data required to be normalized. You should specify it in `IsDataNormalized()`. +Sensors with non-normalized data cannot use PNG compression type. +* The sensor will not further encode the data received from `GetObjectData()` anymore. The values +received from `GetObjectData()` will be the observation sent to the trainer. + +### LSTM models from previous releases no longer supported +The way that Sentis processes LSTM (recurrent neural networks) has changed. As a result, models +trained with previous versions of ML-Agents will not be usable at inference if they were trained with a `memory` +setting in the `.yaml` config file. +If you want to use a model that has a recurrent neural network in this release of ML-Agents, you need to train +the model using the python trainer from this release. + + +## Migrating to Release 13 +### Implementing IHeuristic in your IActuator implementations + - If you have any custom actuators, you can now implement the `IHeuristicProvider` interface to have your actuator + handle the generation of actions when an Agent is running in heuristic mode. +- `VectorSensor.AddObservation(IEnumerable)` is deprecated. Use `VectorSensor.AddObservation(IList)` + instead. +- `ObservationWriter.AddRange()` is deprecated. Use `ObservationWriter.AddList()` instead. +- `ActuatorComponent.CreateActuator()` is deprecated. Please use override `ActuatorComponent.CreateActuators` + instead. Since `ActuatorComponent.CreateActuator()` is abstract, you will still need to override it in your + class until it is removed. It is only ever called if you don't override `ActuatorComponent.CreateActuators`. + You can suppress the warnings by surrounding the method with the following pragma: + ```c# + #pragma warning disable 672 + public IActuator CreateActuator() { ... } + #pragma warning restore 672 + ``` + + +# Migrating +## Migrating to Release 11 +### Agent virtual method deprecation + - `Agent.CollectDiscreteActionMasks()` was deprecated and should be replaced with `Agent.WriteDiscreteActionMask()` + - `Agent.Heuristic(float[])` was deprecated and should be replaced with `Agent.Heuristic(ActionBuffers)`. + - `Agent.OnActionReceived(float[])` was deprecated and should be replaced with `Agent.OnActionReceived(ActionBuffers)`. + - `Agent.GetAction()` was deprecated and should be replaced with `Agent.GetStoredActionBuffers()`. + +The default implementation of these will continue to call the deprecated versions where appropriate. However, the +deprecated versions may not be compatible with continuous and discrete actions on the same Agent. + +### BrainParameters field and method deprecation + - `BrainParameters.VectorActionSize` was deprecated; you can now set `BrainParameters.ActionSpec.NumContinuousActions` + or `BrainParameters.ActionSpec.BranchSizes` instead. + - `BrainParameters.VectorActionSpaceType` was deprecated, since both continuous and discrete actions can now be used. + - `BrainParameters.NumActions()` was deprecated. Use `BrainParameters.ActionSpec.NumContinuousActions` and + `BrainParameters.ActionSpec.NumDiscreteActions` instead. + +## Migrating from Release 7 to latest + +### Important changes +- Some trainer files were moved. If you were using the `TrainerFactory` class, it was moved to +the `trainers/trainer` folder. +- The `components` folder containing `bc` and `reward_signals` code was moved to the `trainers/tf` +folder + +### Steps to Migrate +- Replace calls to `from mlagents.trainers.trainer_util import TrainerFactory` to `from mlagents.trainers.trainer import TrainerFactory` +- Replace calls to `from mlagents.trainers.trainer_util import handle_existing_directories` to `from mlagents.trainers.directory_utils import validate_existing_directories` +- Replace `mlagents.trainers.components` with `mlagents.trainers.tf.components` in your import statements. + + +## Migrating from Release 3 to Release 7 + +### Important changes +- The Parameter Randomization feature has been merged with the Curriculum feature. It is now possible to specify a sampler +in the lesson of a Curriculum. Curriculum has been refactored and is now specified at the level of the parameter, not the +behavior. More information +[here](https://github.com/Unity-Technologies/ml-agents/blob/release_22_docs/docs/Training-ML-Agents.md).(#4160) + +### Steps to Migrate +- The configuration format for curriculum and parameter randomization has changed. To upgrade your configuration files, +an upgrade script has been provided. Run `python -m mlagents.trainers.upgrade_config -h` to see the script usage. Note that you will have had to upgrade to/install the current version of ML-Agents before running the script. To update manually: + - If your config file used a `parameter_randomization` section, rename that section to `environment_parameters` + - If your config file used a `curriculum` section, you will need to rewrite your curriculum with this [format](Training-ML-Agents.md#curriculum). + +## Migrating from Release 1 to Release 3 + +### Important changes +- Training artifacts (trained models, summaries) are now found under `results/` + instead of `summaries/` and `models/`. +- Trainer configuration, curriculum configuration, and parameter randomization + configuration have all been moved to a single YAML file. (#3791) +- Trainer configuration format has changed, and using a "default" behavior name has + been deprecated. (#3936) +- `max_step` in the `TerminalStep` and `TerminalSteps` objects was renamed `interrupted`. +- On the UnityEnvironment API, `get_behavior_names()` and `get_behavior_specs()` methods were combined into the property `behavior_specs` that contains a mapping from behavior names to behavior spec. +- `use_visual` and `allow_multiple_visual_obs` in the `UnityToGymWrapper` constructor +were replaced by `allow_multiple_obs` which allows one or more visual observations and +vector observations to be used simultaneously. +- `--save-freq` has been removed from the CLI and is now configurable in the trainer configuration + file. +- `--lesson` has been removed from the CLI. Lessons will resume when using `--resume`. + To start at a different lesson, modify your Curriculum configuration. + +### Steps to Migrate +- To upgrade your configuration files, an upgrade script has been provided. Run + `python -m mlagents.trainers.upgrade_config -h` to see the script usage. Note that you will have + had to upgrade to/install the current version of ML-Agents before running the script. + + To do it manually, copy your `` sections from `trainer_config.yaml` into a separate trainer configuration file, under a `behaviors` section. + The `default` section is no longer needed. This new file should be specific to your environment, and not contain + configurations for multiple environments (unless they have the same Behavior Names). + - You will need to reformat your trainer settings as per the [example](Training-ML-Agents.md). + - If your training uses [curriculum](Training-ML-Agents.md#curriculum-learning), move those configurations under a `curriculum` section. + - If your training uses [parameter randomization](Training-ML-Agents.md#environment-parameter-randomization), move + the contents of the sampler config to `parameter_randomization` in the main trainer configuration. +- If you are using `UnityEnvironment` directly, replace `max_step` with `interrupted` + in the `TerminalStep` and `TerminalSteps` objects. + - Replace usage of `get_behavior_names()` and `get_behavior_specs()` in UnityEnvironment with `behavior_specs`. + - If you use the `UnityToGymWrapper`, remove `use_visual` and `allow_multiple_visual_obs` + from the constructor and add `allow_multiple_obs = True` if the environment contains either + both visual and vector observations or multiple visual observations. + - If you were setting `--save-freq` in the CLI, add a `checkpoint_interval` value in your + trainer configuration, and set it equal to `save-freq * n_agents_in_scene`. + +## Migrating from 0.15 to Release 1 + +### Important changes + +- The `MLAgents` C# namespace was renamed to `Unity.MLAgents`, and other nested + namespaces were similarly renamed (#3843). +- The `--load` and `--train` command-line flags have been deprecated and + replaced with `--resume` and `--inference`. +- Running with the same `--run-id` twice will now throw an error. +- The `play_against_current_self_ratio` self-play trainer hyperparameter has + been renamed to `play_against_latest_model_ratio` +- Removed the multi-agent gym option from the gym wrapper. For multi-agent + scenarios, use the [Low Level Python API](Python-LLAPI.md). +- The low level Python API has changed. You can look at the document + [Low Level Python API documentation](Python-LLAPI.md) for more information. If + you use `mlagents-learn` for training, this should be a transparent change. +- The obsolete `Agent` methods `GiveModel`, `Done`, `InitializeAgent`, + `AgentAction` and `AgentReset` have been removed. +- The signature of `Agent.Heuristic()` was changed to take a `float[]` as a + parameter, instead of returning the array. This was done to prevent a common + source of error where users would return arrays of the wrong size. +- The SideChannel API has changed (#3833, #3660) : + - Introduced the `SideChannelManager` to register, unregister and access side + channels. + - `EnvironmentParameters` replaces the default `FloatProperties`. You can + access the `EnvironmentParameters` with + `Academy.Instance.EnvironmentParameters` on C#. If you were previously + creating a `UnityEnvironment` in python and passing it a + `FloatPropertiesChannel`, create an `EnvironmentParametersChannel` instead. + - `SideChannel.OnMessageReceived` is now a protected method (was public) + - SideChannel IncomingMessages methods now take an optional default argument, + which is used when trying to read more data than the message contains. + - Added a feature to allow sending stats from C# environments to TensorBoard + (and other python StatsWriters). To do this from your code, use + `Academy.Instance.StatsRecorder.Add(key, value)`(#3660) +- `num_updates` and `train_interval` for SAC have been replaced with + `steps_per_update`. +- The `UnityEnv` class from the `gym-unity` package was renamed + `UnityToGymWrapper` and no longer creates the `UnityEnvironment`. Instead, the + `UnityEnvironment` must be passed as input to the constructor of + `UnityToGymWrapper` +- Public fields and properties on several classes were renamed to follow Unity's + C# style conventions. All public fields and properties now use "PascalCase" + instead of "camelCase"; for example, `Agent.maxStep` was renamed to + `Agent.MaxStep`. For a full list of changes, see the pull request. (#3828) +- `WriteAdapter` was renamed to `ObservationWriter`. (#3834) + +### Steps to Migrate + +- In C# code, replace `using MLAgents` with `using Unity.MLAgents`. Replace + other nested namespaces such as `using MLAgents.Sensors` with + `using Unity.MLAgents.Sensors` +- Replace the `--load` flag with `--resume` when calling `mlagents-learn`, and + don't use the `--train` flag as training will happen by default. To run + without training, use `--inference`. +- To force-overwrite files from a pre-existing run, add the `--force` + command-line flag. +- The Jupyter notebooks have been removed from the repository. +- If your Agent class overrides `Heuristic()`, change the signature to + `public override void Heuristic(float[] actionsOut)` and assign values to + `actionsOut` instead of returning an array. +- If you used `SideChannels` you must: + - Replace `Academy.FloatProperties` with + `Academy.Instance.EnvironmentParameters`. + - `Academy.RegisterSideChannel` and `Academy.UnregisterSideChannel` were + removed. Use `SideChannelManager.RegisterSideChannel` and + `SideChannelManager.UnregisterSideChannel` instead. +- Set `steps_per_update` to be around equal to the number of agents in your + environment, times `num_updates` and divided by `train_interval`. +- Replace `UnityEnv` with `UnityToGymWrapper` in your code. The constructor no + longer takes a file name as input but a fully constructed `UnityEnvironment` + instead. +- Update uses of "camelCase" fields and properties to "PascalCase". + +## Migrating from 0.14 to 0.15 + +### Important changes + +- The `Agent.CollectObservations()` virtual method now takes as input a + `VectorSensor` sensor as argument. The `Agent.AddVectorObs()` methods were + removed. +- The `SetMask` was renamed to `SetMask` method must now be called on the + `DiscreteActionMasker` argument of the `CollectDiscreteActionMasks` virtual + method. +- We consolidated our API for `DiscreteActionMasker`. `SetMask` takes two + arguments : the branch index and the list of masked actions for that branch. +- The `Monitor` class has been moved to the Examples Project. (It was prone to + errors during testing) +- The `MLAgents.Sensors` namespace has been introduced. All sensors classes are + part of the `MLAgents.Sensors` namespace. +- The `MLAgents.SideChannels` namespace has been introduced. All side channel + classes are part of the `MLAgents.SideChannels` namespace. +- The interface for `RayPerceptionSensor.PerceiveStatic()` was changed to take + an input class and write to an output class, and the method was renamed to + `Perceive()`. +- The `SetMask` method must now be called on the `DiscreteActionMasker` argument + of the `CollectDiscreteActionMasks` method. +- The method `GetStepCount()` on the Agent class has been replaced with the + property getter `StepCount` +- The `--multi-gpu` option has been removed temporarily. +- `AgentInfo.actionMasks` has been renamed to `AgentInfo.discreteActionMasks`. +- `BrainParameters` and `SpaceType` have been removed from the public API +- `BehaviorParameters` have been removed from the public API. +- `DecisionRequester` has been made internal (you can still use the + DecisionRequesterComponent from the inspector). `RepeatAction` was renamed + `TakeActionsBetweenDecisions` for clarity. +- The following methods in the `Agent` class have been renamed. The original + method names will be removed in a later release: + - `InitializeAgent()` was renamed to `Initialize()` + - `AgentAction()` was renamed to `OnActionReceived()` + - `AgentReset()` was renamed to `OnEpisodeBegin()` + - `Done()` was renamed to `EndEpisode()` + - `GiveModel()` was renamed to `SetModel()` +- The `IFloatProperties` interface has been removed. +- The interface for SideChannels was changed: + - In C#, `OnMessageReceived` now takes a `IncomingMessage` argument, and + `QueueMessageToSend` takes an `OutgoingMessage` argument. + - In python, `on_message_received` now takes a `IncomingMessage` argument, and + `queue_message_to_send` takes an `OutgoingMessage` argument. + - Automatic stepping for Academy is now controlled from the + AutomaticSteppingEnabled property. + +### Steps to Migrate + +- Add the `using MLAgents.Sensors;` in addition to `using MLAgents;` on top of + your Agent's script. +- Replace your Agent's implementation of `CollectObservations()` with + `CollectObservations(VectorSensor sensor)`. In addition, replace all calls to + `AddVectorObs()` with `sensor.AddObservation()` or + `sensor.AddOneHotObservation()` on the `VectorSensor` passed as argument. +- Replace your calls to `SetActionMask` on your Agent to + `DiscreteActionMasker.SetActionMask` in `CollectDiscreteActionMasks`. +- If you call `RayPerceptionSensor.PerceiveStatic()` manually, add your inputs + to a `RayPerceptionInput`. To get the previous float array output, iterate + through `RayPerceptionOutput.rayOutputs` and call + `RayPerceptionOutput.RayOutput.ToFloatArray()`. +- Replace all calls to `Agent.GetStepCount()` with `Agent.StepCount` +- We strongly recommend replacing the following methods with their new + equivalent as they will be removed in a later release: + - `InitializeAgent()` to `Initialize()` + - `AgentAction()` to `OnActionReceived()` + - `AgentReset()` to `OnEpisodeBegin()` + - `Done()` to `EndEpisode()` + - `GiveModel()` to `SetModel()` +- Replace `IFloatProperties` variables with `FloatPropertiesChannel` variables. +- If you implemented custom `SideChannels`, update the signatures of your + methods, and add your data to the `OutgoingMessage` or read it from the + `IncomingMessage`. +- Replace calls to Academy.EnableAutomaticStepping()/DisableAutomaticStepping() + with Academy.AutomaticSteppingEnabled = true/false. + +## Migrating from 0.13 to 0.14 + +### Important changes + +- The `UnitySDK` folder has been split into a Unity Package + (`com.unity.ml-agents`) and an examples project (`Project`). Please follow the + [Installation Guide](Installation.md) to get up and running with this new repo + structure. +- Several changes were made to how agents are reset and marked as done: + - Calling `Done()` on the Agent will now reset it immediately and call the + `AgentReset` virtual method. (This is to simplify the previous logic in + which the Agent had to wait for the next `EnvironmentStep` to reset) + - The "Reset on Done" setting in AgentParameters was removed; this is now + effectively always true. `AgentOnDone` virtual method on the Agent has been + removed. +- The `Decision Period` and `On Demand decision` checkbox have been removed from + the Agent. On demand decision is now the default (calling `RequestDecision` on + the Agent manually.) +- The Academy class was changed to a singleton, and its virtual methods were + removed. +- Trainer steps are now counted per-Agent, not per-environment as in previous + versions. For instance, if you have 10 Agents in the scene, 20 environment + steps now corresponds to 200 steps as printed in the terminal and in + Tensorboard. +- Curriculum config files are now YAML formatted and all curricula for a + training run are combined into a single file. +- The `--num-runs` command-line option has been removed from `mlagents-learn`. +- Several fields on the Agent were removed or made private in order to simplify + the interface. + - The `agentParameters` field of the Agent has been removed. (Contained only + `maxStep` information) + - `maxStep` is now a public field on the Agent. (Was moved from + `agentParameters`) + - The `Info` field of the Agent has been made private. (Was only used + internally and not meant to be modified outside of the Agent) + - The `GetReward()` method on the Agent has been removed. (It was being + confused with `GetCumulativeReward()`) + - The `AgentAction` struct no longer contains a `value` field. (Value + estimates were not set during inference) + - The `GetValueEstimate()` method on the Agent has been removed. + - The `UpdateValueAction()` method on the Agent has been removed. +- The deprecated `RayPerception3D` and `RayPerception2D` classes were removed, + and the `legacyHitFractionBehavior` argument was removed from + `RayPerceptionSensor.PerceiveStatic()`. +- RayPerceptionSensor was inconsistent in how it handle scale on the Agent's + transform. It now scales the ray length and sphere size for casting as the + transform's scale changes. + +### Steps to Migrate + +- Follow the instructions on how to install the `com.unity.ml-agents` package + into your project in the [Installation Guide](Installation.md). +- If your Agent implemented `AgentOnDone` and did not have the checkbox + `Reset On Done` checked in the inspector, you must call the code that was in + `AgentOnDone` manually. +- If you give your Agent a reward or penalty at the end of an episode (e.g. for + reaching a goal or falling off of a platform), make sure you call + `AddReward()` or `SetReward()` _before_ calling `Done()`. Previously, the + order didn't matter. +- If you were not using `On Demand Decision` for your Agent, you **must** add a + `DecisionRequester` component to your Agent GameObject and set its + `Decision Period` field to the old `Decision Period` of the Agent. +- If you have a class that inherits from Academy: + - If the class didn't override any of the virtual methods and didn't store any + additional data, you can just remove the old script from the scene. + - If the class had additional data, create a new MonoBehaviour and store the + data in the new MonoBehaviour instead. + - If the class overrode the virtual methods, create a new MonoBehaviour and + move the logic to it: + - Move the InitializeAcademy code to MonoBehaviour.Awake + - Move the AcademyStep code to MonoBehaviour.FixedUpdate + - Move the OnDestroy code to MonoBehaviour.OnDestroy. + - Move the AcademyReset code to a new method and add it to the + Academy.OnEnvironmentReset action. +- Multiply `max_steps` and `summary_freq` in your `trainer_config.yaml` by the + number of Agents in the scene. +- Combine curriculum configs into a single file. See + [the WallJump curricula](https://github.com/Unity-Technologies/ml-agents/blob/0.14.1/config/curricula/wall_jump.yaml) for an example of + the new curriculum config format. A tool like https://www.json2yaml.com may be + useful to help with the conversion. +- If you have a model trained which uses RayPerceptionSensor and has non-1.0 + scale in the Agent's transform, it must be retrained. + +## Migrating from ML-Agents Toolkit v0.12.0 to v0.13.0 + +### Important changes + +- The low level Python API has changed. You can look at the document + [Low Level Python API documentation](Python-LLAPI.md) for more information. This + should only affect you if you're writing a custom trainer; if you use + `mlagents-learn` for training, this should be a transparent change. + - `reset()` on the Low-Level Python API no longer takes a `train_mode` + argument. To modify the performance/speed of the engine, you must use an + `EngineConfigurationChannel` + - `reset()` on the Low-Level Python API no longer takes a `config` argument. + `UnityEnvironment` no longer has a `reset_parameters` field. To modify float + properties in the environment, you must use a `FloatPropertiesChannel`. For + more information, refer to the + [Low Level Python API documentation](Python-LLAPI.md) +- `CustomResetParameters` are now removed. +- The Academy no longer has a `Training Configuration` nor + `Inference Configuration` field in the inspector. To modify the configuration + from the Low-Level Python API, use an `EngineConfigurationChannel`. To modify + it during training, use the new command line arguments `--width`, `--height`, + `--quality-level`, `--time-scale` and `--target-frame-rate` in + `mlagents-learn`. +- The Academy no longer has a `Default Reset Parameters` field in the inspector. + The Academy class no longer has a `ResetParameters`. To access shared float + properties with Python, use the new `FloatProperties` field on the Academy. +- Offline Behavioral Cloning has been removed. To learn from demonstrations, use + the GAIL and Behavioral Cloning features with either PPO or SAC. +- `mlagents.envs` was renamed to `mlagents_envs`. The previous repo layout + depended on [PEP420](https://www.python.org/dev/peps/pep-0420/), which caused + problems with some of our tooling such as mypy and pylint. +- The official version of Unity ML-Agents supports is now 2022.3 LTS. If you run + into issues, please consider deleting your library folder and reopening your + projects. You will need to install the Sentis package into your project in + order to ML-Agents to compile correctly. + +### Steps to Migrate + +- If you had a custom `Training Configuration` in the Academy inspector, you + will need to pass your custom configuration at every training run using the + new command line arguments `--width`, `--height`, `--quality-level`, + `--time-scale` and `--target-frame-rate`. +- If you were using `--slow` in `mlagents-learn`, you will need to pass your old + `Inference Configuration` of the Academy inspector with the new command line + arguments `--width`, `--height`, `--quality-level`, `--time-scale` and + `--target-frame-rate` instead. +- Any imports from `mlagents.envs` should be replaced with `mlagents_envs`. + +## Migrating from ML-Agents Toolkit v0.11.0 to v0.12.0 + +### Important Changes + +- Text actions and observations, and custom action and observation protos have + been removed. +- RayPerception3D and RayPerception2D are marked deprecated, and will be removed + in a future release. They can be replaced by RayPerceptionSensorComponent3D + and RayPerceptionSensorComponent2D. +- The `Use Heuristic` checkbox in Behavior Parameters has been replaced with a + `Behavior Type` dropdown menu. This has the following options: + - `Default` corresponds to the previous unchecked behavior, meaning that + Agents will train if they connect to a python trainer, otherwise they will + perform inference. + - `Heuristic Only` means the Agent will always use the `Heuristic()` method. + This corresponds to having "Use Heuristic" selected in 0.11.0. + - `Inference Only` means the Agent will always perform inference. +- ML-Agents was upgraded to use Sentis 1.2.0-exp.2 and is installed via the package manager. + +### Steps to Migrate + +- We [fixed a bug](https://github.com/Unity-Technologies/ml-agents/pull/2823) in + `RayPerception3d.Perceive()` that was causing the `endOffset` to be used + incorrectly. However this may produce different behavior from previous + versions if you use a non-zero `startOffset`. To reproduce the old behavior, + you should increase the value of `endOffset` by `startOffset`. You can + verify your raycasts are performing as expected in scene view using the debug + rays. +- If you use RayPerception3D, replace it with RayPerceptionSensorComponent3D + (and similarly for 2D). The settings, such as ray angles and detectable tags, + are configured on the component now. RayPerception3D would contribute + `(# of rays) * (# of tags + 2)` to the State Size in Behavior Parameters, but + this is no longer necessary, so you should reduce the State Size by this + amount. Making this change will require retraining your model, since the + observations that RayPerceptionSensorComponent3D produces are different from + the old behavior. +- If you see messages such as + `The type or namespace 'Sentis' could not be found` or + `The type or namespace 'Google' could not be found`, you will need to + [install the Sentis preview package](Installation.md#package-installation). + +## Migrating from ML-Agents Toolkit v0.10 to v0.11.0 + +### Important Changes + +- The definition of the gRPC service has changed. +- The online BC training feature has been removed. +- The BroadcastHub has been deprecated. If there is a training Python process, + all LearningBrains in the scene will automatically be trained. If there is no + Python process, inference will be used. +- The Brain ScriptableObjects have been deprecated. The Brain Parameters are now + on the Agent and are referred to as Behavior Parameters. Make sure the + Behavior Parameters is attached to the Agent GameObject. +- To use a heuristic behavior, implement the `Heuristic()` method in the Agent + class and check the `use heuristic` checkbox in the Behavior Parameters. +- Several changes were made to the setup for visual observations (i.e. using + Cameras or RenderTextures): + - Camera resolutions are no longer stored in the Brain Parameters. + - AgentParameters no longer stores lists of Cameras and RenderTextures + - To add visual observations to an Agent, you must now attach a + CameraSensorComponent or RenderTextureComponent to the agent. The + corresponding Camera or RenderTexture can be added to these in the editor, + and the resolution and color/grayscale is configured on the component + itself. + +#### Steps to Migrate + +- In order to be able to train, make sure both your ML-Agents Python package and + UnitySDK code come from the v0.11 release. Training will not work, for + example, if you update the ML-Agents Python package, and only update the API + Version in UnitySDK. +- If your Agents used visual observations, you must add a CameraSensorComponent + corresponding to each old Camera in the Agent's camera list (and similarly for + RenderTextures). +- Since Brain ScriptableObjects have been removed, you will need to delete all + the Brain ScriptableObjects from your `Assets` folder. Then, add a + `Behavior Parameters` component to each `Agent` GameObject. You will then need + to complete the fields on the new `Behavior Parameters` component with the + BrainParameters of the old Brain. + +## Migrating from ML-Agents Toolkit v0.9 to v0.10 + +### Important Changes + +- We have updated the C# code in our repository to be in line with Unity Coding + Conventions. This has changed the name of some public facing classes and + enums. +- The example environments have been updated. If you were using these + environments to benchmark your training, please note that the resulting + rewards may be slightly different in v0.10. + +#### Steps to Migrate + +- `UnitySDK/Assets/ML-Agents/Scripts/Communicator.cs` and its class + `Communicator` have been renamed to + `UnitySDK/Assets/ML-Agents/Scripts/ICommunicator.cs` and `ICommunicator` + respectively. +- The `SpaceType` Enums `discrete`, and `continuous` have been renamed to + `Discrete` and `Continuous`. +- We have removed the `Done` call as well as the capacity to set `Max Steps` on + the Academy. Therefore an AcademyReset will never be triggered from C# (only + from Python). If you want to reset the simulation after a fixed number of + steps, or when an event in the simulation occurs, we recommend looking at our + multi-agent example environments (such as FoodCollector). In our examples, + groups of Agents can be reset through an "Area" that can reset groups of + Agents. +- The import for `mlagents.envs.UnityEnvironment` was removed. If you are using + the Python API, change `from mlagents_envs import UnityEnvironment` to + `from mlagents_envs.environment import UnityEnvironment`. + +## Migrating from ML-Agents Toolkit v0.8 to v0.9 + +### Important Changes + +- We have changed the way reward signals (including Curiosity) are defined in + the `trainer_config.yaml`. +- When using multiple environments, every "step" is recorded in TensorBoard. +- The steps in the command line console corresponds to a single step of a single + environment. Previously, each step corresponded to one step for all + environments (i.e., `num_envs` steps). + +#### Steps to Migrate + +- If you were overriding any of these following parameters in your config file, + remove them from the top-level config and follow the steps below: + - `gamma`: Define a new `extrinsic` reward signal and set it's `gamma` to your + new gamma. + - `use_curiosity`, `curiosity_strength`, `curiosity_enc_size`: Define a + `curiosity` reward signal and set its `strength` to `curiosity_strength`, + and `encoding_size` to `curiosity_enc_size`. Give it the same `gamma` as + your `extrinsic` signal to mimic previous behavior. +- TensorBoards generated when running multiple environments in v0.8 are not + comparable to those generated in v0.9 in terms of step count. Multiply your + v0.8 step count by `num_envs` for an approximate comparison. You may need to + change `max_steps` in your config as appropriate as well. + +## Migrating from ML-Agents Toolkit v0.7 to v0.8 + +### Important Changes + +- We have split the Python packages into two separate packages `ml-agents` and + `ml-agents-envs`. +- `--worker-id` option of `learn.py` has been removed, use `--base-port` instead + if you'd like to run multiple instances of `learn.py`. + +#### Steps to Migrate + +- If you are installing via PyPI, there is no change. +- If you intend to make modifications to `ml-agents` or `ml-agents-envs` please + check the Installing for Development in the + [Installation documentation](Installation.md). + +## Migrating from ML-Agents Toolkit v0.6 to v0.7 + +### Important Changes + +- We no longer support TFS and are now using the + [Sentis](Inference-Engine.md) + +#### Steps to Migrate + +- Make sure to remove the `ENABLE_TENSORFLOW` flag in your Unity Project + settings + +## Migrating from ML-Agents Toolkit v0.5 to v0.6 + +### Important Changes + +- Brains are now Scriptable Objects instead of MonoBehaviors. +- You can no longer modify the type of a Brain. If you want to switch between + `PlayerBrain` and `LearningBrain` for multiple agents, you will need to assign + a new Brain to each agent separately. **Note:** You can pass the same Brain to + multiple agents in a scene by leveraging Unity's prefab system or look for all + the agents in a scene using the search bar of the `Hierarchy` window with the + word `Agent`. + +- We replaced the **Internal** and **External** Brain with **Learning Brain**. + When you need to train a model, you need to drag it into the `Broadcast Hub` + inside the `Academy` and check the `Control` checkbox. +- We removed the `Broadcast` checkbox of the Brain, to use the broadcast + functionality, you need to drag the Brain into the `Broadcast Hub`. +- When training multiple Brains at the same time, each model is now stored into + a separate model file rather than in the same file under different graph + scopes. +- The **Learning Brain** graph scope, placeholder names, output names and custom + placeholders can no longer be modified. + +#### Steps to Migrate + +- To update a scene from v0.5 to v0.6, you must: + - Remove the `Brain` GameObjects in the scene. (Delete all of the Brain + GameObjects under Academy in the scene.) + - Create new `Brain` Scriptable Objects using `Assets -> Create -> ML-Agents` + for each type of the Brain you plan to use, and put the created files under + a folder called Brains within your project. + - Edit their `Brain Parameters` to be the same as the parameters used in the + `Brain` GameObjects. + - Agents have a `Brain` field in the Inspector, you need to drag the + appropriate Brain ScriptableObject in it. + - The Academy has a `Broadcast Hub` field in the inspector, which is list of + brains used in the scene. To train or control your Brain from the + `mlagents-learn` Python script, you need to drag the relevant + `LearningBrain` ScriptableObjects used in your scene into entries into this + list. + +## Migrating from ML-Agents Toolkit v0.4 to v0.5 + +### Important + +- The Unity project `unity-environment` has been renamed `UnitySDK`. +- The `python` folder has been renamed to `ml-agents`. It now contains two + packages, `mlagents.env` and `mlagents.trainers`. `mlagents.env` can be used + to interact directly with a Unity environment, while `mlagents.trainers` + contains the classes for training agents. +- The supported Unity version has changed from `2017.1 or later` to + `2017.4 or later`. 2017.4 is an LTS (Long Term Support) version that helps us + maintain good quality and support. Earlier versions of Unity might still work, + but you may encounter an + [error](FAQ.md#instance-of-corebraininternal-couldnt-be-created) listed here. + +### Unity API + +- Discrete Actions now use [branches](https://arxiv.org/abs/1711.08946). You can + now specify concurrent discrete actions. You will need to update the Brain + Parameters in the Brain Inspector in all your environments that use discrete + actions. Refer to the + [discrete action documentation](Learning-Environment-Design-Agents.md#discrete-action-space) + for more information. + +### Python API + +- In order to run a training session, you can now use the command + `mlagents-learn` instead of `python3 learn.py` after installing the `mlagents` + packages. This change is documented + [here](Training-ML-Agents.md#training-with-mlagents-learn). For example, if we + previously ran + + ```sh + python3 learn.py 3DBall --train + ``` + + from the `python` subdirectory (which is changed to `ml-agents` subdirectory + in v0.5), we now run + + ```sh + mlagents-learn config/trainer_config.yaml --env=3DBall --train + ``` + + from the root directory where we installed the ML-Agents Toolkit. + +- It is now required to specify the path to the yaml trainer configuration file + when running `mlagents-learn`. For an example trainer configuration file, see + [trainer_config.yaml](https://github.com/Unity-Technologies/ml-agents/blob/0.5.0a/config/trainer_config.yaml). An example of passing a + trainer configuration to `mlagents-learn` is shown above. +- The environment name is now passed through the `--env` option. +- Curriculum learning has been changed. In summary: + - Curriculum files for the same environment must now be placed into a folder. + Each curriculum file should be named after the Brain whose curriculum it + specifies. + - `min_lesson_length` now specifies the minimum number of episodes in a lesson + and affects reward thresholding. + - It is no longer necessary to specify the `Max Steps` of the Academy to use + curriculum learning. + +## Migrating from ML-Agents Toolkit v0.3 to v0.4 + +### Unity API + +- `using MLAgents;` needs to be added in all of the C# scripts that use + ML-Agents. + +### Python API + +- We've changed some of the Python packages dependencies in requirement.txt + file. Make sure to run `pip3 install -e .` within your `ml-agents/python` + folder to update your Python packages. + +## Migrating from ML-Agents Toolkit v0.2 to v0.3 + +There are a large number of new features and improvements in the ML-Agents +toolkit v0.3 which change both the training process and Unity API in ways which +will cause incompatibilities with environments made using older versions. This +page is designed to highlight those changes for users familiar with v0.1 or v0.2 +in order to ensure a smooth transition. + +### Important + +- The ML-Agents Toolkit is no longer compatible with Python 2. + +### Python Training + +- The training script `ppo.py` and `PPO.ipynb` Python notebook have been + replaced with a single `learn.py` script as the launching point for training + with ML-Agents. For more information on using `learn.py`, see + [here](Training-ML-Agents.md#training-with-mlagents-learn). +- Hyperparameters for training Brains are now stored in the + `trainer_config.yaml` file. For more information on using this file, see + [here](Training-ML-Agents.md#training-configurations). + +### Unity API + +- Modifications to an Agent's rewards must now be done using either + `AddReward()` or `SetReward()`. +- Setting an Agent to done now requires the use of the `Done()` method. +- `CollectStates()` has been replaced by `CollectObservations()`, which now no + longer returns a list of floats. +- To collect observations, call `AddVectorObs()` within `CollectObservations()`. + Note that you can call `AddVectorObs()` with floats, integers, lists and + arrays of floats, Vector3 and Quaternions. +- `AgentStep()` has been replaced by `AgentAction()`. +- `WaitTime()` has been removed. +- The `Frame Skip` field of the Academy is replaced by the Agent's + `Decision Frequency` field, enabling the Agent to make decisions at different + frequencies. +- The names of the inputs in the Internal Brain have been changed. You must + replace `state` with `vector_observation` and `observation` with + `visual_observation`. In addition, you must remove the `epsilon` placeholder. + +### Semantics + +In order to more closely align with the terminology used in the Reinforcement +Learning field, and to be more descriptive, we have changed the names of some of +the concepts used in ML-Agents. The changes are highlighted in the table below. + +| Old - v0.2 and earlier | New - v0.3 and later | +| ---------------------- | -------------------- | +| State | Vector Observation | +| Observation | Visual Observation | +| Action | Vector Action | +| N/A | Text Observation | +| N/A | Text Action | diff --git a/com.unity.ml-agents/Documentation~/Package-Settings.md b/com.unity.ml-agents/Documentation~/Package-Settings.md new file mode 100644 index 0000000000..d796e52de2 --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Package-Settings.md @@ -0,0 +1,33 @@ +# ML-Agents Package Settings + +ML-Agents Package Settings contains settings that apply to the whole project. +It allows you to configure ML-Agents-specific settings in the Editor. These settings are available for use in both the Editor and Player. + +You can find them at `Edit` > `Project Settings...` > `ML-Agents`. It lists out all the available settings and their default values. + + +## Create Custom Settings +In order to to use your own settings for your project, you'll need to create a settings asset. + +You can do this by clicking the `Create Settings Asset` button or clicking the gear on the top right and select `New Settings Asset...`. +The asset file can be placed anywhere in the `Asset/` folder in your project. +After Creating the settings asset, you'll be able to modify the settings for your project and your settings will be saved in the asset. + +![Package Settings](images/package-settings.png) + + +## Multiple Custom Settings for Different Scenarios +You can create multiple settings assets in one project. + +By clicking the gear on the top right you'll see all available settings listed in the drop-down menu to choose from. + +This allows you to create different settings for different scenarios. For example, you can create two +separate settings for training and inference, and specify which one you want to use according to what you're currently running. + +![Multiple Settings](images/multiple-settings.png) + + + + + + diff --git a/com.unity.ml-agents/Documentation~/Profiling-Python.md b/com.unity.ml-agents/Documentation~/Profiling-Python.md new file mode 100644 index 0000000000..21bc529423 --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Profiling-Python.md @@ -0,0 +1,72 @@ +# Profiling in Python + +As part of the ML-Agents Toolkit, we provide a lightweight profiling system, in +order to identity hotspots in the training process and help spot regressions +from changes. + +Timers are hierarchical, meaning that the time tracked in a block of code can be +further split into other blocks if desired. This also means that a function that +is called from multiple places in the code will appear in multiple places in the +timing output. + +All timers operate using a "global" instance by default, but this can be +overridden if necessary (mainly for testing). + +## Adding Profiling + +There are two ways to indicate code should be included in profiling. The +simplest way is to add the `@timed` decorator to a function or method of +interested. + +```python +class TrainerController: + # .... + @timed + def advance(self, env: EnvManager) -> int: + # do stuff +``` + +You can also used the `hierarchical_timer` context manager. + +```python +with hierarchical_timer("communicator.exchange"): + outputs = self.communicator.exchange(step_input) +``` + +The context manager may be easier than the `@timed` decorator for profiling +different parts of a large function, or profiling calls to abstract methods that +might not use decorator. + +## Output + +By default, at the end of training, timers are collected and written in json +format to `{summaries_dir}/{run_id}_timers.json`. The output consists of node +objects with the following keys: + +- total (float): The total time in seconds spent in the block, including child + calls. +- count (int): The number of times the block was called. +- self (float): The total time in seconds spent in the block, excluding child + calls. +- children (dictionary): A dictionary of child nodes, keyed by the node name. +- is_parallel (bool): Indicates that the block of code was executed in multiple + threads or processes (see below). This is optional and defaults to false. + +### Parallel execution + +#### Subprocesses + +For code that executes in multiple processes (for example, +SubprocessEnvManager), we periodically send the timer information back to the +"main" process, aggregate the timers there, and flush them in the subprocess. +Note that (depending on the number of processes) this can result in timers where +the total time may exceed the parent's total time. This is analogous to the +difference between "real" and "user" values reported from the unix `time` +command. In the timer output, blocks that were run in parallel are indicated by +the `is_parallel` flag. + +#### Threads + +Timers currently use `time.perf_counter()` to track time spent, which may not +give accurate results for multiple threads. If this is problematic, set +`threaded: false` in your trainer configuration. diff --git a/com.unity.ml-agents/Documentation~/Python-Custom-Trainer-Plugin.md b/com.unity.ml-agents/Documentation~/Python-Custom-Trainer-Plugin.md new file mode 100644 index 0000000000..4c78bfc513 --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Python-Custom-Trainer-Plugin.md @@ -0,0 +1,51 @@ +# Unity Ml-Agents Custom trainers Plugin + +As an attempt to bring a wider variety of reinforcement learning algorithms to our users, we have added custom trainers +capabilities. we introduce an extensible plugin system to define new trainers based on the High level trainer API +in `Ml-agents` Package. This will allow rerouting `mlagents-learn` CLI to custom trainers and extending the config files +with hyper-parameters specific to your new trainers. We will expose a high-level extensible trainer (both on-policy, +and off-policy trainers) optimizer and hyperparameter classes with documentation for the use of this plugin. For more +information on how python plugin system works see [Plugin interfaces](Training-Plugins.md). +## Overview +Model-free RL algorithms generally fall into two broad categories: on-policy and off-policy. On-policy algorithms perform updates based on data gathered from the current policy. Off-policy algorithms learn a Q function from a buffer of previous data, then use this Q function to make decisions. Off-policy algorithms have three key benefits in the context of ML-Agents: They tend to use fewer samples than on-policy as they can pull and re-use data from the buffer many times. They allow player demonstrations to be inserted in-line with RL data into the buffer, enabling new ways of doing imitation learning by streaming player data. + +To add new custom trainers to ML-agents, you would need to create a new python package. +To give you an idea of how to structure your package, we have created a [mlagents_trainer_plugin](../ml-agents-trainer-plugin) package ourselves as an +example, with implementation of `A2c` and `DQN` algorithms. You would need a `setup.py` file to list extra requirements and +register the new RL algorithm in ml-agents ecosystem and be able to call `mlagents-learn` CLI with your customized +configuration. + + +```shell +├── mlagents_trainer_plugin +│ ├── __init__.py +│ ├── a2c +│ │ ├── __init__.py +│ │ ├── a2c_3DBall.yaml +│ │ ├── a2c_optimizer.py +│ │ └── a2c_trainer.py +│ └── dqn +│ ├── __init__.py +│ ├── dqn_basic.yaml +│ ├── dqn_optimizer.py +│ └── dqn_trainer.py +└── setup.py +``` +## Installation and Execution +If you haven't already, follow the [installation instructions](Installation.md). Once you have the `ml-agents-env` and `ml-agents` packages you can install the plugin package. From the repository's root directory install `ml-agents-trainer-plugin` (or replace with the name of your plugin folder). + +```sh +pip3 install -e <./ml-agents-trainer-plugin> +``` + +Following the previous installations your package is added as an entrypoint and you can use a config file with new +trainers: +```sh +mlagents-learn ml-agents-trainer-plugin/mlagents_trainer_plugin/a2c/a2c_3DBall.yaml --run-id +--env +``` + +## Tutorial +Here’s a step-by-step [tutorial](Tutorial-Custom-Trainer-Plugin.md) on how to write a setup file and extend ml-agents trainers, optimizers, and +hyperparameter settings.To extend ML-agents classes see references on +[trainers](Python-On-Off-Policy-Trainer-Documentation.md) and [Optimizer](Python-Optimizer-Documentation.md). diff --git a/com.unity.ml-agents/Documentation~/Python-Gym-API-Documentation.md b/com.unity.ml-agents/Documentation~/Python-Gym-API-Documentation.md new file mode 100644 index 0000000000..b35771fc46 --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Python-Gym-API-Documentation.md @@ -0,0 +1,161 @@ +# Table of Contents + +* [mlagents\_envs.envs.unity\_gym\_env](#mlagents_envs.envs.unity_gym_env) + * [UnityGymException](#mlagents_envs.envs.unity_gym_env.UnityGymException) + * [UnityToGymWrapper](#mlagents_envs.envs.unity_gym_env.UnityToGymWrapper) + * [\_\_init\_\_](#mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.__init__) + * [reset](#mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.reset) + * [step](#mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.step) + * [render](#mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.render) + * [close](#mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.close) + * [seed](#mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.seed) + * [ActionFlattener](#mlagents_envs.envs.unity_gym_env.ActionFlattener) + * [\_\_init\_\_](#mlagents_envs.envs.unity_gym_env.ActionFlattener.__init__) + * [lookup\_action](#mlagents_envs.envs.unity_gym_env.ActionFlattener.lookup_action) + + +# mlagents\_envs.envs.unity\_gym\_env + + +## UnityGymException Objects + +```python +class UnityGymException(error.Error) +``` + +Any error related to the gym wrapper of ml-agents. + + +## UnityToGymWrapper Objects + +```python +class UnityToGymWrapper(gym.Env) +``` + +Provides Gym wrapper for Unity Learning Environments. + + +#### \_\_init\_\_ + +```python + | __init__(unity_env: BaseEnv, uint8_visual: bool = False, flatten_branched: bool = False, allow_multiple_obs: bool = False, action_space_seed: Optional[int] = None) +``` + +Environment initialization + +**Arguments**: + +- `unity_env`: The Unity BaseEnv to be wrapped in the gym. Will be closed when the UnityToGymWrapper closes. +- `uint8_visual`: Return visual observations as uint8 (0-255) matrices instead of float (0.0-1.0). +- `flatten_branched`: If True, turn branched discrete action spaces into a Discrete space rather than + MultiDiscrete. +- `allow_multiple_obs`: If True, return a list of np.ndarrays as observations with the first elements + containing the visual observations and the last element containing the array of vector observations. + If False, returns a single np.ndarray containing either only a single visual observation or the array of + vector observations. +- `action_space_seed`: If non-None, will be used to set the random seed on created gym.Space instances. + + +#### reset + +```python + | reset() -> Union[List[np.ndarray], np.ndarray] +``` + +Resets the state of the environment and returns an initial observation. +Returns: observation (object/list): the initial observation of the +space. + + +#### step + +```python + | step(action: List[Any]) -> GymStepResult +``` + +Run one timestep of the environment's dynamics. When end of +episode is reached, you are responsible for calling `reset()` +to reset this environment's state. +Accepts an action and returns a tuple (observation, reward, done, info). + +**Arguments**: + +- `action` _object/list_ - an action provided by the environment + +**Returns**: + +- `observation` _object/list_ - agent's observation of the current environment + reward (float/list) : amount of reward returned after previous action +- `done` _boolean/list_ - whether the episode has ended. +- `info` _dict_ - contains auxiliary diagnostic information. + + +#### render + +```python + | render(mode="rgb_array") +``` + +Return the latest visual observations. +Note that it will not render a new frame of the environment. + + +#### close + +```python + | close() -> None +``` + +Override _close in your subclass to perform any necessary cleanup. +Environments will automatically close() themselves when +garbage collected or when the program exits. + + +#### seed + +```python + | seed(seed: Any = None) -> None +``` + +Sets the seed for this env's random number generator(s). +Currently not implemented. + + +## ActionFlattener Objects + +```python +class ActionFlattener() +``` + +Flattens branched discrete action spaces into single-branch discrete action spaces. + + +#### \_\_init\_\_ + +```python + | __init__(branched_action_space) +``` + +Initialize the flattener. + +**Arguments**: + +- `branched_action_space`: A List containing the sizes of each branch of the action +space, e.g. [2,3,3] for three branches with size 2, 3, and 3 respectively. + + +#### lookup\_action + +```python + | lookup_action(action) +``` + +Convert a scalar discrete action into a unique set of branched actions. + +**Arguments**: + +- `action`: A scalar value representing one of the discrete actions. + +**Returns**: + +The List containing the branched actions. diff --git a/com.unity.ml-agents/Documentation~/Python-Gym-API.md b/com.unity.ml-agents/Documentation~/Python-Gym-API.md new file mode 100644 index 0000000000..97869899ce --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Python-Gym-API.md @@ -0,0 +1,357 @@ +# Unity ML-Agents Gym Wrapper + +A common way in which machine learning researchers interact with simulation +environments is via a wrapper provided by OpenAI called `gym`. For more +information on the gym interface, see [here](https://github.com/openai/gym). + +We provide a gym wrapper and instructions for using it with existing machine +learning algorithms which utilize gym. Our wrapper provides interfaces on top of +our `UnityEnvironment` class, which is the default way of interfacing with a +Unity environment via Python. + +## Installation + +The gym wrapper is part of the `mlagents_envs` package. Please refer to the +[mlagents_envs installation instructions](ML-Agents-Envs-README.md). + + +## Using the Gym Wrapper + +The gym interface is available from `gym_unity.envs`. To launch an environment +from the root of the project repository use: + +```python +from mlagents_envs.envs.unity_gym_env import UnityToGymWrapper + +env = UnityToGymWrapper(unity_env, uint8_visual, flatten_branched, allow_multiple_obs) +``` + +- `unity_env` refers to the Unity environment to be wrapped. + +- `uint8_visual` refers to whether to output visual observations as `uint8` + values (0-255). Many common Gym environments (e.g. Atari) do this. By default + they will be floats (0.0-1.0). Defaults to `False`. + +- `flatten_branched` will flatten a branched discrete action space into a Gym + Discrete. Otherwise, it will be converted into a MultiDiscrete. Defaults to + `False`. + +- `allow_multiple_obs` will return a list of observations. The first elements + contain the visual observations and the last element contains the array of + vector observations. If False the environment returns a single array (containing + a single visual observations, if present, otherwise the vector observation). + Defaults to `False`. + +- `action_space_seed` is the optional seed for action sampling. If non-None, will + be used to set the random seed on created gym.Space instances. + +The returned environment `env` will function as a gym. + +## Limitations + +- It is only possible to use an environment with a **single** Agent. +- By default, the first visual observation is provided as the `observation`, if + present. Otherwise, vector observations are provided. You can receive all + visual and vector observations by using the `allow_multiple_obs=True` option in + the gym parameters. If set to `True`, you will receive a list of `observation` + instead of only one. +- The `TerminalSteps` or `DecisionSteps` output from the environment can still + be accessed from the `info` provided by `env.step(action)`. +- Stacked vector observations are not supported. +- Environment registration for use with `gym.make()` is currently not supported. +- Calling env.render() will not render a new frame of the environment. It will + return the latest visual observation if using visual observations. + +## Running OpenAI Baselines Algorithms + +OpenAI provides a set of open-source maintained and tested Reinforcement +Learning algorithms called the [Baselines](https://github.com/openai/baselines). + +Using the provided Gym wrapper, it is possible to train ML-Agents environments +using these algorithms. This requires the creation of custom training scripts to +launch each algorithm. In most cases these scripts can be created by making +slight modifications to the ones provided for Atari and Mujoco environments. + +These examples were tested with baselines version 0.1.6. + +### Example - DQN Baseline + +In order to train an agent to play the `GridWorld` environment using the +Baselines DQN algorithm, you first need to install the baselines package using +pip: + +``` +pip install git+git://github.com/openai/baselines +``` + +Next, create a file called `train_unity.py`. Then create an `/envs/` directory +and build the environment to that directory. For more information on +building Unity environments, see +[here](../docs/Learning-Environment-Executable.md). Note that because of +limitations of the DQN baseline, the environment must have a single visual +observation, a single discrete action and a single Agent in the scene. +Add the following code to the `train_unity.py` file: + +```python +import gym + +from baselines import deepq +from baselines import logger + +from mlagents_envs.environment import UnityEnvironment +from mlagents_envs.envs.unity_gym_env import UnityToGymWrapper + + +def main(): + unity_env = UnityEnvironment( < path - to - environment >) + env = UnityToGymWrapper(unity_env, uint8_visual=True) + logger.configure('./logs') # Change to log in a different directory + act = deepq.learn( + env, + "cnn", # For visual inputs + lr=2.5e-4, + total_timesteps=1000000, + buffer_size=50000, + exploration_fraction=0.05, + exploration_final_eps=0.1, + print_freq=20, + train_freq=5, + learning_starts=20000, + target_network_update_freq=50, + gamma=0.99, + prioritized_replay=False, + checkpoint_freq=1000, + checkpoint_path='./logs', # Change to save model in a different directory + dueling=True + ) + print("Saving model to unity_model.pkl") + act.save("unity_model.pkl") + + +if __name__ == '__main__': + main() +``` + +To start the training process, run the following from the directory containing +`train_unity.py`: + +```sh +python -m train_unity +``` + +### Other Algorithms + +Other algorithms in the Baselines repository can be run using scripts similar to +the examples from the baselines package. In most cases, the primary changes +needed to use a Unity environment are to import `UnityToGymWrapper`, and to +replace the environment creation code, typically `gym.make()`, with a call to +`UnityToGymWrapper(unity_environment)` passing the environment as input. + +A typical rule of thumb is that for vision-based environments, modification +should be done to Atari training scripts, and for vector observation +environments, modification should be done to Mujoco scripts. + +Some algorithms will make use of `make_env()` or `make_mujoco_env()` functions. +You can define a similar function for Unity environments. An example of such a +method using the PPO2 baseline: + +```python +from mlagents_envs.environment import UnityEnvironment +from mlagents_envs.envs import UnityToGymWrapper +from baselines.common.vec_env.subproc_vec_env import SubprocVecEnv +from baselines.common.vec_env.dummy_vec_env import DummyVecEnv +from baselines.bench import Monitor +from baselines import logger +import baselines.ppo2.ppo2 as ppo2 + +import os + +try: + from mpi4py import MPI +except ImportError: + MPI = None + + +def make_unity_env(env_directory, num_env, visual, start_index=0): + """ + Create a wrapped, monitored Unity environment. + """ + + def make_env(rank, use_visual=True): # pylint: disable=C0111 + def _thunk(): + unity_env = UnityEnvironment(env_directory, base_port=5000 + rank) + env = UnityToGymWrapper(unity_env, uint8_visual=True) + env = Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str(rank))) + return env + + return _thunk + + if visual: + return SubprocVecEnv([make_env(i + start_index) for i in range(num_env)]) + else: + rank = MPI.COMM_WORLD.Get_rank() if MPI else 0 + return DummyVecEnv([make_env(rank, use_visual=False)]) + + +def main(): + env = make_unity_env( < path - to - environment >, 4, True) + ppo2.learn( + network="mlp", + env=env, + total_timesteps=100000, + lr=1e-3, + ) + + +if __name__ == '__main__': + main() +``` + +## Run Google Dopamine Algorithms + +Google provides a framework [Dopamine](https://github.com/google/dopamine), and +implementations of algorithms, e.g. DQN, Rainbow, and the C51 variant of +Rainbow. Using the Gym wrapper, we can run Unity environments using Dopamine. + +First, after installing the Gym wrapper, clone the Dopamine repository. + +``` +git clone https://github.com/google/dopamine +``` + +Then, follow the appropriate install instructions as specified on +[Dopamine's homepage](https://github.com/google/dopamine). Note that the +Dopamine guide specifies using a virtualenv. If you choose to do so, make sure +your unity_env package is also installed within the same virtualenv as Dopamine. + +### Adapting Dopamine's Scripts + +First, open `dopamine/atari/run_experiment.py`. Alternatively, copy the entire +`atari` folder, and name it something else (e.g. `unity`). If you choose the +copy approach, be sure to change the package names in the import statements in +`train.py` to your new directory. + +Within `run_experiment.py`, we will need to make changes to which environment is +instantiated, just as in the Baselines example. At the top of the file, insert + +```python +from mlagents_envs.environment import UnityEnvironment +from mlagents_envs.envs import UnityToGymWrapper +``` + +to import the Gym Wrapper. Navigate to the `create_atari_environment` method in +the same file, and switch to instantiating a Unity environment by replacing the +method with the following code. + +```python + game_version = 'v0' if sticky_actions else 'v4' + full_game_name = '{}NoFrameskip-{}'.format(game_name, game_version) + unity_env = UnityEnvironment() + env = UnityToGymWrapper(unity_env, uint8_visual=True) + return env +``` + +`` is the path to your built Unity executable. For more +information on building Unity environments, see +[here](../docs/Learning-Environment-Executable.md), and note the Limitations +section below. + +Note that we are not using the preprocessor from Dopamine, as it uses many +Atari-specific calls. Furthermore, frame-skipping can be done from within Unity, +rather than on the Python side. + +### Limitations + +Since Dopamine is designed around variants of DQN, it is only compatible with +discrete action spaces, and specifically the Discrete Gym space. For +environments that use branched discrete action spaces, you can enable the +`flatten_branched` parameter in `UnityToGymWrapper`, which treats each +combination of branched actions as separate actions. + +Furthermore, when building your environments, ensure that your Agent is using +visual observations with greyscale enabled, and that the dimensions of the +visual observations is 84 by 84 (matches the parameter found in `dqn_agent.py` +and `rainbow_agent.py`). Dopamine's agents currently do not automatically adapt +to the observation dimensions or number of channels. + +### Hyperparameters + +The hyperparameters provided by Dopamine are tailored to the Atari games, and +you will likely need to adjust them for ML-Agents environments. Here is a sample +`dopamine/agents/rainbow/configs/rainbow.gin` file that is known to work with +a simple GridWorld. + +```python +import dopamine.agents.rainbow.rainbow_agent +import dopamine.unity.run_experiment +import dopamine.replay_memory.prioritized_replay_buffer +import gin.tf.external_configurables + +RainbowAgent.num_atoms = 51 +RainbowAgent.stack_size = 1 +RainbowAgent.vmax = 10. +RainbowAgent.gamma = 0.99 +RainbowAgent.update_horizon = 3 +RainbowAgent.min_replay_history = 20000 # agent steps +RainbowAgent.update_period = 5 +RainbowAgent.target_update_period = 50 # agent steps +RainbowAgent.epsilon_train = 0.1 +RainbowAgent.epsilon_eval = 0.01 +RainbowAgent.epsilon_decay_period = 50000 # agent steps +RainbowAgent.replay_scheme = 'prioritized' +RainbowAgent.tf_device = '/cpu:0' # use '/cpu:*' for non-GPU version +RainbowAgent.optimizer = @tf.train.AdamOptimizer() + +tf.train.AdamOptimizer.learning_rate = 0.00025 +tf.train.AdamOptimizer.epsilon = 0.0003125 + +Runner.game_name = "Unity" # any name can be used here +Runner.sticky_actions = False +Runner.num_iterations = 200 +Runner.training_steps = 10000 # agent steps +Runner.evaluation_steps = 500 # agent steps +Runner.max_steps_per_episode = 27000 # agent steps + +WrappedPrioritizedReplayBuffer.replay_capacity = 1000000 +WrappedPrioritizedReplayBuffer.batch_size = 32 +``` + +This example assumed you copied `atari` to a separate folder named `unity`. +Replace `unity` in `import dopamine.unity.run_experiment` with the folder you +copied your `run_experiment.py` and `trainer.py` files to. If you directly +modified the existing files, then use `atari` here. + +### Starting a Run + +You can now run Dopamine as you would normally: + +``` +python -um dopamine.unity.train \ + --agent_name=rainbow \ + --base_dir=/tmp/dopamine \ + --gin_files='dopamine/agents/rainbow/configs/rainbow.gin' +``` + +Again, we assume that you've copied `atari` into a separate folder. Remember to +replace `unity` with the directory you copied your files into. If you edited the +Atari files directly, this should be `atari`. + +### Example: GridWorld + +As a baseline, here are rewards over time for the three algorithms provided with +Dopamine as run on the GridWorld example environment. All Dopamine (DQN, +Rainbow, C51) runs were done with the same epsilon, epsilon decay, replay +history, training steps, and buffer settings as specified above. Note that the +first 20000 steps are used to pre-fill the training buffer, and no learning +happens. + +We provide results from our PPO implementation and the DQN from Baselines as +reference. Note that all runs used the same greyscale GridWorld as Dopamine. For +PPO, `num_layers` was set to 2, and all other hyperparameters are the default +for GridWorld in `config/ppo/GridWorld.yaml`. For Baselines DQN, the provided +hyperparameters in the previous section are used. Note that Baselines implements +certain features (e.g. dueling-Q) that are not enabled in Dopamine DQN. + + +![Dopamine on GridWorld](images/dopamine_gridworld_plot.png) + diff --git a/com.unity.ml-agents/Documentation~/Python-LLAPI-Documentation.md b/com.unity.ml-agents/Documentation~/Python-LLAPI-Documentation.md new file mode 100644 index 0000000000..9cba2f9c07 --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Python-LLAPI-Documentation.md @@ -0,0 +1,1362 @@ +# Table of Contents + +* [mlagents\_envs.base\_env](#mlagents_envs.base_env) + * [DecisionStep](#mlagents_envs.base_env.DecisionStep) + * [DecisionSteps](#mlagents_envs.base_env.DecisionSteps) + * [agent\_id\_to\_index](#mlagents_envs.base_env.DecisionSteps.agent_id_to_index) + * [\_\_getitem\_\_](#mlagents_envs.base_env.DecisionSteps.__getitem__) + * [empty](#mlagents_envs.base_env.DecisionSteps.empty) + * [TerminalStep](#mlagents_envs.base_env.TerminalStep) + * [TerminalSteps](#mlagents_envs.base_env.TerminalSteps) + * [agent\_id\_to\_index](#mlagents_envs.base_env.TerminalSteps.agent_id_to_index) + * [\_\_getitem\_\_](#mlagents_envs.base_env.TerminalSteps.__getitem__) + * [empty](#mlagents_envs.base_env.TerminalSteps.empty) + * [ActionTuple](#mlagents_envs.base_env.ActionTuple) + * [discrete\_dtype](#mlagents_envs.base_env.ActionTuple.discrete_dtype) + * [ActionSpec](#mlagents_envs.base_env.ActionSpec) + * [is\_discrete](#mlagents_envs.base_env.ActionSpec.is_discrete) + * [is\_continuous](#mlagents_envs.base_env.ActionSpec.is_continuous) + * [discrete\_size](#mlagents_envs.base_env.ActionSpec.discrete_size) + * [empty\_action](#mlagents_envs.base_env.ActionSpec.empty_action) + * [random\_action](#mlagents_envs.base_env.ActionSpec.random_action) + * [create\_continuous](#mlagents_envs.base_env.ActionSpec.create_continuous) + * [create\_discrete](#mlagents_envs.base_env.ActionSpec.create_discrete) + * [create\_hybrid](#mlagents_envs.base_env.ActionSpec.create_hybrid) + * [DimensionProperty](#mlagents_envs.base_env.DimensionProperty) + * [UNSPECIFIED](#mlagents_envs.base_env.DimensionProperty.UNSPECIFIED) + * [NONE](#mlagents_envs.base_env.DimensionProperty.NONE) + * [TRANSLATIONAL\_EQUIVARIANCE](#mlagents_envs.base_env.DimensionProperty.TRANSLATIONAL_EQUIVARIANCE) + * [VARIABLE\_SIZE](#mlagents_envs.base_env.DimensionProperty.VARIABLE_SIZE) + * [ObservationType](#mlagents_envs.base_env.ObservationType) + * [DEFAULT](#mlagents_envs.base_env.ObservationType.DEFAULT) + * [GOAL\_SIGNAL](#mlagents_envs.base_env.ObservationType.GOAL_SIGNAL) + * [ObservationSpec](#mlagents_envs.base_env.ObservationSpec) + * [BehaviorSpec](#mlagents_envs.base_env.BehaviorSpec) + * [BaseEnv](#mlagents_envs.base_env.BaseEnv) + * [step](#mlagents_envs.base_env.BaseEnv.step) + * [reset](#mlagents_envs.base_env.BaseEnv.reset) + * [close](#mlagents_envs.base_env.BaseEnv.close) + * [behavior\_specs](#mlagents_envs.base_env.BaseEnv.behavior_specs) + * [set\_actions](#mlagents_envs.base_env.BaseEnv.set_actions) + * [set\_action\_for\_agent](#mlagents_envs.base_env.BaseEnv.set_action_for_agent) + * [get\_steps](#mlagents_envs.base_env.BaseEnv.get_steps) +* [mlagents\_envs.environment](#mlagents_envs.environment) + * [UnityEnvironment](#mlagents_envs.environment.UnityEnvironment) + * [\_\_init\_\_](#mlagents_envs.environment.UnityEnvironment.__init__) + * [close](#mlagents_envs.environment.UnityEnvironment.close) +* [mlagents\_envs.registry](#mlagents_envs.registry) +* [mlagents\_envs.registry.unity\_env\_registry](#mlagents_envs.registry.unity_env_registry) + * [UnityEnvRegistry](#mlagents_envs.registry.unity_env_registry.UnityEnvRegistry) + * [register](#mlagents_envs.registry.unity_env_registry.UnityEnvRegistry.register) + * [register\_from\_yaml](#mlagents_envs.registry.unity_env_registry.UnityEnvRegistry.register_from_yaml) + * [clear](#mlagents_envs.registry.unity_env_registry.UnityEnvRegistry.clear) + * [\_\_getitem\_\_](#mlagents_envs.registry.unity_env_registry.UnityEnvRegistry.__getitem__) +* [mlagents\_envs.side\_channel](#mlagents_envs.side_channel) +* [mlagents\_envs.side\_channel.raw\_bytes\_channel](#mlagents_envs.side_channel.raw_bytes_channel) + * [RawBytesChannel](#mlagents_envs.side_channel.raw_bytes_channel.RawBytesChannel) + * [on\_message\_received](#mlagents_envs.side_channel.raw_bytes_channel.RawBytesChannel.on_message_received) + * [get\_and\_clear\_received\_messages](#mlagents_envs.side_channel.raw_bytes_channel.RawBytesChannel.get_and_clear_received_messages) + * [send\_raw\_data](#mlagents_envs.side_channel.raw_bytes_channel.RawBytesChannel.send_raw_data) +* [mlagents\_envs.side\_channel.outgoing\_message](#mlagents_envs.side_channel.outgoing_message) + * [OutgoingMessage](#mlagents_envs.side_channel.outgoing_message.OutgoingMessage) + * [\_\_init\_\_](#mlagents_envs.side_channel.outgoing_message.OutgoingMessage.__init__) + * [write\_bool](#mlagents_envs.side_channel.outgoing_message.OutgoingMessage.write_bool) + * [write\_int32](#mlagents_envs.side_channel.outgoing_message.OutgoingMessage.write_int32) + * [write\_float32](#mlagents_envs.side_channel.outgoing_message.OutgoingMessage.write_float32) + * [write\_float32\_list](#mlagents_envs.side_channel.outgoing_message.OutgoingMessage.write_float32_list) + * [write\_string](#mlagents_envs.side_channel.outgoing_message.OutgoingMessage.write_string) + * [set\_raw\_bytes](#mlagents_envs.side_channel.outgoing_message.OutgoingMessage.set_raw_bytes) +* [mlagents\_envs.side\_channel.engine\_configuration\_channel](#mlagents_envs.side_channel.engine_configuration_channel) + * [EngineConfigurationChannel](#mlagents_envs.side_channel.engine_configuration_channel.EngineConfigurationChannel) + * [on\_message\_received](#mlagents_envs.side_channel.engine_configuration_channel.EngineConfigurationChannel.on_message_received) + * [set\_configuration\_parameters](#mlagents_envs.side_channel.engine_configuration_channel.EngineConfigurationChannel.set_configuration_parameters) + * [set\_configuration](#mlagents_envs.side_channel.engine_configuration_channel.EngineConfigurationChannel.set_configuration) +* [mlagents\_envs.side\_channel.side\_channel\_manager](#mlagents_envs.side_channel.side_channel_manager) + * [SideChannelManager](#mlagents_envs.side_channel.side_channel_manager.SideChannelManager) + * [process\_side\_channel\_message](#mlagents_envs.side_channel.side_channel_manager.SideChannelManager.process_side_channel_message) + * [generate\_side\_channel\_messages](#mlagents_envs.side_channel.side_channel_manager.SideChannelManager.generate_side_channel_messages) +* [mlagents\_envs.side\_channel.stats\_side\_channel](#mlagents_envs.side_channel.stats_side_channel) + * [StatsSideChannel](#mlagents_envs.side_channel.stats_side_channel.StatsSideChannel) + * [on\_message\_received](#mlagents_envs.side_channel.stats_side_channel.StatsSideChannel.on_message_received) + * [get\_and\_reset\_stats](#mlagents_envs.side_channel.stats_side_channel.StatsSideChannel.get_and_reset_stats) +* [mlagents\_envs.side\_channel.incoming\_message](#mlagents_envs.side_channel.incoming_message) + * [IncomingMessage](#mlagents_envs.side_channel.incoming_message.IncomingMessage) + * [\_\_init\_\_](#mlagents_envs.side_channel.incoming_message.IncomingMessage.__init__) + * [read\_bool](#mlagents_envs.side_channel.incoming_message.IncomingMessage.read_bool) + * [read\_int32](#mlagents_envs.side_channel.incoming_message.IncomingMessage.read_int32) + * [read\_float32](#mlagents_envs.side_channel.incoming_message.IncomingMessage.read_float32) + * [read\_float32\_list](#mlagents_envs.side_channel.incoming_message.IncomingMessage.read_float32_list) + * [read\_string](#mlagents_envs.side_channel.incoming_message.IncomingMessage.read_string) + * [get\_raw\_bytes](#mlagents_envs.side_channel.incoming_message.IncomingMessage.get_raw_bytes) +* [mlagents\_envs.side\_channel.float\_properties\_channel](#mlagents_envs.side_channel.float_properties_channel) + * [FloatPropertiesChannel](#mlagents_envs.side_channel.float_properties_channel.FloatPropertiesChannel) + * [on\_message\_received](#mlagents_envs.side_channel.float_properties_channel.FloatPropertiesChannel.on_message_received) + * [set\_property](#mlagents_envs.side_channel.float_properties_channel.FloatPropertiesChannel.set_property) + * [get\_property](#mlagents_envs.side_channel.float_properties_channel.FloatPropertiesChannel.get_property) + * [list\_properties](#mlagents_envs.side_channel.float_properties_channel.FloatPropertiesChannel.list_properties) + * [get\_property\_dict\_copy](#mlagents_envs.side_channel.float_properties_channel.FloatPropertiesChannel.get_property_dict_copy) +* [mlagents\_envs.side\_channel.environment\_parameters\_channel](#mlagents_envs.side_channel.environment_parameters_channel) + * [EnvironmentParametersChannel](#mlagents_envs.side_channel.environment_parameters_channel.EnvironmentParametersChannel) + * [set\_float\_parameter](#mlagents_envs.side_channel.environment_parameters_channel.EnvironmentParametersChannel.set_float_parameter) + * [set\_uniform\_sampler\_parameters](#mlagents_envs.side_channel.environment_parameters_channel.EnvironmentParametersChannel.set_uniform_sampler_parameters) + * [set\_gaussian\_sampler\_parameters](#mlagents_envs.side_channel.environment_parameters_channel.EnvironmentParametersChannel.set_gaussian_sampler_parameters) + * [set\_multirangeuniform\_sampler\_parameters](#mlagents_envs.side_channel.environment_parameters_channel.EnvironmentParametersChannel.set_multirangeuniform_sampler_parameters) +* [mlagents\_envs.side\_channel.side\_channel](#mlagents_envs.side_channel.side_channel) + * [SideChannel](#mlagents_envs.side_channel.side_channel.SideChannel) + * [queue\_message\_to\_send](#mlagents_envs.side_channel.side_channel.SideChannel.queue_message_to_send) + * [on\_message\_received](#mlagents_envs.side_channel.side_channel.SideChannel.on_message_received) + * [channel\_id](#mlagents_envs.side_channel.side_channel.SideChannel.channel_id) + + +# mlagents\_envs.base\_env + +Python Environment API for the ML-Agents Toolkit +The aim of this API is to expose Agents evolving in a simulation +to perform reinforcement learning on. +This API supports multi-agent scenarios and groups similar Agents (same +observations, actions spaces and behavior) together. These groups of Agents are +identified by their BehaviorName. +For performance reasons, the data of each group of agents is processed in a +batched manner. Agents are identified by a unique AgentId identifier that +allows tracking of Agents across simulation steps. Note that there is no +guarantee that the number or order of the Agents in the state will be +consistent across simulation steps. +A simulation steps corresponds to moving the simulation forward until at least +one agent in the simulation sends its observations to Python again. Since +Agents can request decisions at different frequencies, a simulation step does +not necessarily correspond to a fixed simulation time increment. + + +## DecisionStep Objects + +```python +class DecisionStep(NamedTuple) +``` + +Contains the data a single Agent collected since the last +simulation step. + - obs is a list of numpy arrays observations collected by the agent. + - reward is a float. Corresponds to the rewards collected by the agent + since the last simulation step. + - agent_id is an int and an unique identifier for the corresponding Agent. + - action_mask is an optional list of one dimensional array of booleans. + Only available when using multi-discrete actions. + Each array corresponds to an action branch. Each array contains a mask + for each action of the branch. If true, the action is not available for + the agent during this simulation step. + + +## DecisionSteps Objects + +```python +class DecisionSteps(Mapping) +``` + +Contains the data a batch of similar Agents collected since the last +simulation step. Note that all Agents do not necessarily have new +information to send at each simulation step. Therefore, the ordering of +agents and the batch size of the DecisionSteps are not fixed across +simulation steps. + - obs is a list of numpy arrays observations collected by the batch of + agent. Each obs has one extra dimension compared to DecisionStep: the + first dimension of the array corresponds to the batch size of the batch. + - reward is a float vector of length batch size. Corresponds to the + rewards collected by each agent since the last simulation step. + - agent_id is an int vector of length batch size containing unique + identifier for the corresponding Agent. This is used to track Agents + across simulation steps. + - action_mask is an optional list of two dimensional array of booleans. + Only available when using multi-discrete actions. + Each array corresponds to an action branch. The first dimension of each + array is the batch size and the second contains a mask for each action of + the branch. If true, the action is not available for the agent during + this simulation step. + + +#### agent\_id\_to\_index + +```python + | @property + | agent_id_to_index() -> Dict[AgentId, int] +``` + +**Returns**: + +A Dict that maps agent_id to the index of those agents in +this DecisionSteps. + + +#### \_\_getitem\_\_ + +```python + | __getitem__(agent_id: AgentId) -> DecisionStep +``` + +returns the DecisionStep for a specific agent. + +**Arguments**: + +- `agent_id`: The id of the agent + +**Returns**: + +The DecisionStep + + +#### empty + +```python + | @staticmethod + | empty(spec: "BehaviorSpec") -> "DecisionSteps" +``` + +Returns an empty DecisionSteps. + +**Arguments**: + +- `spec`: The BehaviorSpec for the DecisionSteps + + +## TerminalStep Objects + +```python +class TerminalStep(NamedTuple) +``` + +Contains the data a single Agent collected when its episode ended. + - obs is a list of numpy arrays observations collected by the agent. + - reward is a float. Corresponds to the rewards collected by the agent + since the last simulation step. + - interrupted is a bool. Is true if the Agent was interrupted since the last + decision step. For example, if the Agent reached the maximum number of steps for + the episode. + - agent_id is an int and an unique identifier for the corresponding Agent. + + +## TerminalSteps Objects + +```python +class TerminalSteps(Mapping) +``` + +Contains the data a batch of Agents collected when their episode +terminated. All Agents present in the TerminalSteps have ended their +episode. + - obs is a list of numpy arrays observations collected by the batch of + agent. Each obs has one extra dimension compared to DecisionStep: the + first dimension of the array corresponds to the batch size of the batch. + - reward is a float vector of length batch size. Corresponds to the + rewards collected by each agent since the last simulation step. + - interrupted is an array of booleans of length batch size. Is true if the + associated Agent was interrupted since the last decision step. For example, if the + Agent reached the maximum number of steps for the episode. + - agent_id is an int vector of length batch size containing unique + identifier for the corresponding Agent. This is used to track Agents + across simulation steps. + + +#### agent\_id\_to\_index + +```python + | @property + | agent_id_to_index() -> Dict[AgentId, int] +``` + +**Returns**: + +A Dict that maps agent_id to the index of those agents in +this TerminalSteps. + + +#### \_\_getitem\_\_ + +```python + | __getitem__(agent_id: AgentId) -> TerminalStep +``` + +returns the TerminalStep for a specific agent. + +**Arguments**: + +- `agent_id`: The id of the agent + +**Returns**: + +obs, reward, done, agent_id and optional action mask for a +specific agent + + +#### empty + +```python + | @staticmethod + | empty(spec: "BehaviorSpec") -> "TerminalSteps" +``` + +Returns an empty TerminalSteps. + +**Arguments**: + +- `spec`: The BehaviorSpec for the TerminalSteps + + +## ActionTuple Objects + +```python +class ActionTuple(_ActionTupleBase) +``` + +An object whose fields correspond to actions of different types. +Continuous and discrete actions are numpy arrays of type float32 and +int32, respectively and are type checked on construction. +Dimensions are of (n_agents, continuous_size) and (n_agents, discrete_size), +respectively. Note, this also holds when continuous or discrete size is +zero. + + +#### discrete\_dtype + +```python + | @property + | discrete_dtype() -> np.dtype +``` + +The dtype of a discrete action. + + +## ActionSpec Objects + +```python +class ActionSpec(NamedTuple) +``` + +A NamedTuple containing utility functions and information about the action spaces +for a group of Agents under the same behavior. +- num_continuous_actions is an int corresponding to the number of floats which +constitute the action. +- discrete_branch_sizes is a Tuple of int where each int corresponds to +the number of discrete actions available to the agent on an independent action branch. + + +#### is\_discrete + +```python + | is_discrete() -> bool +``` + +Returns true if this Behavior uses discrete actions + + +#### is\_continuous + +```python + | is_continuous() -> bool +``` + +Returns true if this Behavior uses continuous actions + + +#### discrete\_size + +```python + | @property + | discrete_size() -> int +``` + +Returns a an int corresponding to the number of discrete branches. + + +#### empty\_action + +```python + | empty_action(n_agents: int) -> ActionTuple +``` + +Generates ActionTuple corresponding to an empty action (all zeros) +for a number of agents. + +**Arguments**: + +- `n_agents`: The number of agents that will have actions generated + + +#### random\_action + +```python + | random_action(n_agents: int) -> ActionTuple +``` + +Generates ActionTuple corresponding to a random action (either discrete +or continuous) for a number of agents. + +**Arguments**: + +- `n_agents`: The number of agents that will have actions generated + + +#### create\_continuous + +```python + | @staticmethod + | create_continuous(continuous_size: int) -> "ActionSpec" +``` + +Creates an ActionSpec that is homogenously continuous + + +#### create\_discrete + +```python + | @staticmethod + | create_discrete(discrete_branches: Tuple[int]) -> "ActionSpec" +``` + +Creates an ActionSpec that is homogenously discrete + + +#### create\_hybrid + +```python + | @staticmethod + | create_hybrid(continuous_size: int, discrete_branches: Tuple[int]) -> "ActionSpec" +``` + +Creates a hybrid ActionSpace + + +## DimensionProperty Objects + +```python +class DimensionProperty(IntFlag) +``` + +The dimension property of a dimension of an observation. + + +#### UNSPECIFIED + +No properties specified. + + +#### NONE + +No Property of the observation in that dimension. Observation can be processed with +Fully connected networks. + + +#### TRANSLATIONAL\_EQUIVARIANCE + +Means it is suitable to do a convolution in this dimension. + + +#### VARIABLE\_SIZE + +Means that there can be a variable number of observations in this dimension. +The observations are unordered. + + +## ObservationType Objects + +```python +class ObservationType(Enum) +``` + +An Enum which defines the type of information carried in the observation +of the agent. + + +#### DEFAULT + +Observation information is generic. + + +#### GOAL\_SIGNAL + +Observation contains goal information for current task. + + +## ObservationSpec Objects + +```python +class ObservationSpec(NamedTuple) +``` + +A NamedTuple containing information about the observation of Agents. +- shape is a Tuple of int : It corresponds to the shape of +an observation's dimensions. +- dimension_property is a Tuple of DimensionProperties flag, one flag for each +dimension. +- observation_type is an enum of ObservationType. + + +## BehaviorSpec Objects + +```python +class BehaviorSpec(NamedTuple) +``` + +A NamedTuple containing information about the observation and action +spaces for a group of Agents under the same behavior. +- observation_specs is a List of ObservationSpec NamedTuple containing +information about the information of the Agent's observations such as their shapes. +The order of the ObservationSpec is the same as the order of the observations of an +agent. +- action_spec is an ActionSpec NamedTuple. + + +## BaseEnv Objects + +```python +class BaseEnv(ABC) +``` + + +#### step + +```python + | @abstractmethod + | step() -> None +``` + +Signals the environment that it must move the simulation forward +by one step. + + +#### reset + +```python + | @abstractmethod + | reset() -> None +``` + +Signals the environment that it must reset the simulation. + + +#### close + +```python + | @abstractmethod + | close() -> None +``` + +Signals the environment that it must close. + + +#### behavior\_specs + +```python + | @property + | @abstractmethod + | behavior_specs() -> MappingType[str, BehaviorSpec] +``` + +Returns a Mapping from behavior names to behavior specs. +Agents grouped under the same behavior name have the same action and +observation specs, and are expected to behave similarly in the +environment. +Note that new keys can be added to this mapping as new policies are instantiated. + + +#### set\_actions + +```python + | @abstractmethod + | set_actions(behavior_name: BehaviorName, action: ActionTuple) -> None +``` + +Sets the action for all of the agents in the simulation for the next +step. The Actions must be in the same order as the order received in +the DecisionSteps. + +**Arguments**: + +- `behavior_name`: The name of the behavior the agents are part of +- `action`: ActionTuple tuple of continuous and/or discrete action. +Actions are np.arrays with dimensions (n_agents, continuous_size) and +(n_agents, discrete_size), respectively. + + +#### set\_action\_for\_agent + +```python + | @abstractmethod + | set_action_for_agent(behavior_name: BehaviorName, agent_id: AgentId, action: ActionTuple) -> None +``` + +Sets the action for one of the agents in the simulation for the next +step. + +**Arguments**: + +- `behavior_name`: The name of the behavior the agent is part of +- `agent_id`: The id of the agent the action is set for +- `action`: ActionTuple tuple of continuous and/or discrete action +Actions are np.arrays with dimensions (1, continuous_size) and +(1, discrete_size), respectively. Note, this initial dimensions of 1 is because +this action is meant for a single agent. + + +#### get\_steps + +```python + | @abstractmethod + | get_steps(behavior_name: BehaviorName) -> Tuple[DecisionSteps, TerminalSteps] +``` + +Retrieves the steps of the agents that requested a step in the +simulation. + +**Arguments**: + +- `behavior_name`: The name of the behavior the agents are part of + +**Returns**: + +A tuple containing : +- A DecisionSteps NamedTuple containing the observations, +the rewards, the agent ids and the action masks for the Agents +of the specified behavior. These Agents need an action this step. +- A TerminalSteps NamedTuple containing the observations, +rewards, agent ids and interrupted flags of the agents that had their +episode terminated last step. + + +# mlagents\_envs.environment + + +## UnityEnvironment Objects + +```python +class UnityEnvironment(BaseEnv) +``` + + +#### \_\_init\_\_ + +```python + | __init__(file_name: Optional[str] = None, worker_id: int = 0, base_port: Optional[int] = None, seed: int = 0, no_graphics: bool = False, no_graphics_monitor: bool = False, timeout_wait: int = 60, additional_args: Optional[List[str]] = None, side_channels: Optional[List[SideChannel]] = None, log_folder: Optional[str] = None, num_areas: int = 1) +``` + +Starts a new unity environment and establishes a connection with the environment. +Notice: Currently communication between Unity and Python takes place over an open socket without authentication. +Ensure that the network where training takes place is secure. + +:string file_name: Name of Unity environment binary. :int base_port: Baseline port number to connect to Unity +environment over. worker_id increments over this. If no environment is specified (i.e. file_name is None), +the DEFAULT_EDITOR_PORT will be used. :int worker_id: Offset from base_port. Used for training multiple +environments simultaneously. :bool no_graphics: Whether to run the Unity simulator in no-graphics mode :bool +no_graphics_monitor: Whether to run the main worker in graphics mode, with the remaining in no-graphics mode +:int timeout_wait: Time (in seconds) to wait for connection from environment. :list args: Addition Unity +command line arguments :list side_channels: Additional side channel for no-rl communication with Unity :str +log_folder: Optional folder to write the Unity Player log file into. Requires absolute path. + + +#### close + +```python + | close() +``` + +Sends a shutdown signal to the unity environment, and closes the socket connection. + + +# mlagents\_envs.registry + + +# mlagents\_envs.registry.unity\_env\_registry + + +## UnityEnvRegistry Objects + +```python +class UnityEnvRegistry(Mapping) +``` + +### UnityEnvRegistry +Provides a library of Unity environments that can be launched without the need +of downloading the Unity Editor. +The UnityEnvRegistry implements a Map, to access an entry of the Registry, use: +```python +registry = UnityEnvRegistry() +entry = registry[] +``` +An entry has the following properties : + * `identifier` : Uniquely identifies this environment + * `expected_reward` : Corresponds to the reward an agent must obtained for the task + to be considered completed. + * `description` : A human readable description of the environment. + +To launch a Unity environment from a registry entry, use the `make` method: +```python +registry = UnityEnvRegistry() +env = registry[].make() +``` + + +#### register + +```python + | register(new_entry: BaseRegistryEntry) -> None +``` + +Registers a new BaseRegistryEntry to the registry. The +BaseRegistryEntry.identifier value will be used as indexing key. +If two are more environments are registered under the same key, the most +recentry added will replace the others. + + +#### register\_from\_yaml + +```python + | register_from_yaml(path_to_yaml: str) -> None +``` + +Registers the environments listed in a yaml file (either local or remote). Note +that the entries are registered lazily: the registration will only happen when +an environment is accessed. +The yaml file must have the following format : +```yaml +environments: +- : + expected_reward: + description: | + + linux_url: + darwin_url: + win_url: + +- : + expected_reward: + description: | + + linux_url: + darwin_url: + win_url: + +- ... +``` + +**Arguments**: + +- `path_to_yaml`: A local path or url to the yaml file + + +#### clear + +```python + | clear() -> None +``` + +Deletes all entries in the registry. + + +#### \_\_getitem\_\_ + +```python + | __getitem__(identifier: str) -> BaseRegistryEntry +``` + +Returns the BaseRegistryEntry with the provided identifier. BaseRegistryEntry +can then be used to make a Unity Environment. + +**Arguments**: + +- `identifier`: The identifier of the BaseRegistryEntry + +**Returns**: + +The associated BaseRegistryEntry + + +# mlagents\_envs.side\_channel + + +# mlagents\_envs.side\_channel.raw\_bytes\_channel + + +## RawBytesChannel Objects + +```python +class RawBytesChannel(SideChannel) +``` + +This is an example of what the SideChannel for raw bytes exchange would +look like. Is meant to be used for general research purpose. + + +#### on\_message\_received + +```python + | on_message_received(msg: IncomingMessage) -> None +``` + +Is called by the environment to the side channel. Can be called +multiple times per step if multiple messages are meant for that +SideChannel. + + +#### get\_and\_clear\_received\_messages + +```python + | get_and_clear_received_messages() -> List[bytes] +``` + +returns a list of bytearray received from the environment. + + +#### send\_raw\_data + +```python + | send_raw_data(data: bytearray) -> None +``` + +Queues a message to be sent by the environment at the next call to +step. + + +# mlagents\_envs.side\_channel.outgoing\_message + + +## OutgoingMessage Objects + +```python +class OutgoingMessage() +``` + +Utility class for forming the message that is written to a SideChannel. +All data is written in little-endian format using the struct module. + + +#### \_\_init\_\_ + +```python + | __init__() +``` + +Create an OutgoingMessage with an empty buffer. + + +#### write\_bool + +```python + | write_bool(b: bool) -> None +``` + +Append a boolean value. + + +#### write\_int32 + +```python + | write_int32(i: int) -> None +``` + +Append an integer value. + + +#### write\_float32 + +```python + | write_float32(f: float) -> None +``` + +Append a float value. It will be truncated to 32-bit precision. + + +#### write\_float32\_list + +```python + | write_float32_list(float_list: List[float]) -> None +``` + +Append a list of float values. They will be truncated to 32-bit precision. + + +#### write\_string + +```python + | write_string(s: str) -> None +``` + +Append a string value. Internally, it will be encoded to ascii, and the +encoded length will also be written to the message. + + +#### set\_raw\_bytes + +```python + | set_raw_bytes(buffer: bytearray) -> None +``` + +Set the internal buffer to a new bytearray. This will overwrite any existing data. + +**Arguments**: + +- `buffer`: + +**Returns**: + + + + +# mlagents\_envs.side\_channel.engine\_configuration\_channel + + +## EngineConfigurationChannel Objects + +```python +class EngineConfigurationChannel(SideChannel) +``` + +This is the SideChannel for engine configuration exchange. The data in the +engine configuration is as follows : + - int width; + - int height; + - int qualityLevel; + - float timeScale; + - int targetFrameRate; + - int captureFrameRate; + + +#### on\_message\_received + +```python + | on_message_received(msg: IncomingMessage) -> None +``` + +Is called by the environment to the side channel. Can be called +multiple times per step if multiple messages are meant for that +SideChannel. +Note that Python should never receive an engine configuration from +Unity + + +#### set\_configuration\_parameters + +```python + | set_configuration_parameters(width: Optional[int] = None, height: Optional[int] = None, quality_level: Optional[int] = None, time_scale: Optional[float] = None, target_frame_rate: Optional[int] = None, capture_frame_rate: Optional[int] = None) -> None +``` + +Sets the engine configuration. Takes as input the configurations of the +engine. + +**Arguments**: + +- `width`: Defines the width of the display. (Must be set alongside height) +- `height`: Defines the height of the display. (Must be set alongside width) +- `quality_level`: Defines the quality level of the simulation. +- `time_scale`: Defines the multiplier for the deltatime in the +simulation. If set to a higher value, time will pass faster in the +simulation but the physics might break. +- `target_frame_rate`: Instructs simulation to try to render at a +specified frame rate. +- `capture_frame_rate`: Instructs the simulation to consider time between +updates to always be constant, regardless of the actual frame rate. + + +#### set\_configuration + +```python + | set_configuration(config: EngineConfig) -> None +``` + +Sets the engine configuration. Takes as input an EngineConfig. + + +# mlagents\_envs.side\_channel.side\_channel\_manager + + +## SideChannelManager Objects + +```python +class SideChannelManager() +``` + + +#### process\_side\_channel\_message + +```python + | process_side_channel_message(data: bytes) -> None +``` + +Separates the data received from Python into individual messages for each +registered side channel and calls on_message_received on them. + +**Arguments**: + +- `data`: The packed message sent by Unity + + +#### generate\_side\_channel\_messages + +```python + | generate_side_channel_messages() -> bytearray +``` + +Gathers the messages that the registered side channels will send to Unity +and combines them into a single message ready to be sent. + + +# mlagents\_envs.side\_channel.stats\_side\_channel + + +## StatsSideChannel Objects + +```python +class StatsSideChannel(SideChannel) +``` + +Side channel that receives (string, float) pairs from the environment, so that they can eventually +be passed to a StatsReporter. + + +#### on\_message\_received + +```python + | on_message_received(msg: IncomingMessage) -> None +``` + +Receive the message from the environment, and save it for later retrieval. + +**Arguments**: + +- `msg`: + +**Returns**: + + + + +#### get\_and\_reset\_stats + +```python + | get_and_reset_stats() -> EnvironmentStats +``` + +Returns the current stats, and resets the internal storage of the stats. + +**Returns**: + + + + +# mlagents\_envs.side\_channel.incoming\_message + + +## IncomingMessage Objects + +```python +class IncomingMessage() +``` + +Utility class for reading the message written to a SideChannel. +Values must be read in the order they were written. + + +#### \_\_init\_\_ + +```python + | __init__(buffer: bytes, offset: int = 0) +``` + +Create a new IncomingMessage from the bytes. + + +#### read\_bool + +```python + | read_bool(default_value: bool = False) -> bool +``` + +Read a boolean value from the message buffer. + +**Arguments**: + +- `default_value`: Default value to use if the end of the message is reached. + +**Returns**: + +The value read from the message, or the default value if the end was reached. + + +#### read\_int32 + +```python + | read_int32(default_value: int = 0) -> int +``` + +Read an integer value from the message buffer. + +**Arguments**: + +- `default_value`: Default value to use if the end of the message is reached. + +**Returns**: + +The value read from the message, or the default value if the end was reached. + + +#### read\_float32 + +```python + | read_float32(default_value: float = 0.0) -> float +``` + +Read a float value from the message buffer. + +**Arguments**: + +- `default_value`: Default value to use if the end of the message is reached. + +**Returns**: + +The value read from the message, or the default value if the end was reached. + + +#### read\_float32\_list + +```python + | read_float32_list(default_value: List[float] = None) -> List[float] +``` + +Read a list of float values from the message buffer. + +**Arguments**: + +- `default_value`: Default value to use if the end of the message is reached. + +**Returns**: + +The value read from the message, or the default value if the end was reached. + + +#### read\_string + +```python + | read_string(default_value: str = "") -> str +``` + +Read a string value from the message buffer. + +**Arguments**: + +- `default_value`: Default value to use if the end of the message is reached. + +**Returns**: + +The value read from the message, or the default value if the end was reached. + + +#### get\_raw\_bytes + +```python + | get_raw_bytes() -> bytes +``` + +Get a copy of the internal bytes used by the message. + + +# mlagents\_envs.side\_channel.float\_properties\_channel + + +## FloatPropertiesChannel Objects + +```python +class FloatPropertiesChannel(SideChannel) +``` + +This is the SideChannel for float properties shared with Unity. +You can modify the float properties of an environment with the commands +set_property, get_property and list_properties. + + +#### on\_message\_received + +```python + | on_message_received(msg: IncomingMessage) -> None +``` + +Is called by the environment to the side channel. Can be called +multiple times per step if multiple messages are meant for that +SideChannel. + + +#### set\_property + +```python + | set_property(key: str, value: float) -> None +``` + +Sets a property in the Unity Environment. + +**Arguments**: + +- `key`: The string identifier of the property. +- `value`: The float value of the property. + + +#### get\_property + +```python + | get_property(key: str) -> Optional[float] +``` + +Gets a property in the Unity Environment. If the property was not +found, will return None. + +**Arguments**: + +- `key`: The string identifier of the property. + +**Returns**: + +The float value of the property or None. + + +#### list\_properties + +```python + | list_properties() -> List[str] +``` + +Returns a list of all the string identifiers of the properties +currently present in the Unity Environment. + + +#### get\_property\_dict\_copy + +```python + | get_property_dict_copy() -> Dict[str, float] +``` + +Returns a copy of the float properties. + +**Returns**: + + + + +# mlagents\_envs.side\_channel.environment\_parameters\_channel + + +## EnvironmentParametersChannel Objects + +```python +class EnvironmentParametersChannel(SideChannel) +``` + +This is the SideChannel for sending environment parameters to Unity. +You can send parameters to an environment with the command +set_float_parameter. + + +#### set\_float\_parameter + +```python + | set_float_parameter(key: str, value: float) -> None +``` + +Sets a float environment parameter in the Unity Environment. + +**Arguments**: + +- `key`: The string identifier of the parameter. +- `value`: The float value of the parameter. + + +#### set\_uniform\_sampler\_parameters + +```python + | set_uniform_sampler_parameters(key: str, min_value: float, max_value: float, seed: int) -> None +``` + +Sets a uniform environment parameter sampler. + +**Arguments**: + +- `key`: The string identifier of the parameter. +- `min_value`: The minimum of the sampling distribution. +- `max_value`: The maximum of the sampling distribution. +- `seed`: The random seed to initialize the sampler. + + +#### set\_gaussian\_sampler\_parameters + +```python + | set_gaussian_sampler_parameters(key: str, mean: float, st_dev: float, seed: int) -> None +``` + +Sets a gaussian environment parameter sampler. + +**Arguments**: + +- `key`: The string identifier of the parameter. +- `mean`: The mean of the sampling distribution. +- `st_dev`: The standard deviation of the sampling distribution. +- `seed`: The random seed to initialize the sampler. + + +#### set\_multirangeuniform\_sampler\_parameters + +```python + | set_multirangeuniform_sampler_parameters(key: str, intervals: List[Tuple[float, float]], seed: int) -> None +``` + +Sets a multirangeuniform environment parameter sampler. + +**Arguments**: + +- `key`: The string identifier of the parameter. +- `intervals`: The lists of min and max that define each uniform distribution. +- `seed`: The random seed to initialize the sampler. + + +# mlagents\_envs.side\_channel.side\_channel + + +## SideChannel Objects + +```python +class SideChannel(ABC) +``` + +The side channel just get access to a bytes buffer that will be shared +between C# and Python. For example, We will create a specific side channel +for properties that will be a list of string (fixed size) to float number, +that can be modified by both C# and Python. All side channels are passed +to the Env object at construction. + + +#### queue\_message\_to\_send + +```python + | queue_message_to_send(msg: OutgoingMessage) -> None +``` + +Queues a message to be sent by the environment at the next call to +step. + + +#### on\_message\_received + +```python + | @abstractmethod + | on_message_received(msg: IncomingMessage) -> None +``` + +Is called by the environment to the side channel. Can be called +multiple times per step if multiple messages are meant for that +SideChannel. + + +#### channel\_id + +```python + | @property + | channel_id() -> uuid.UUID +``` + +**Returns**: + +The type of side channel used. Will influence how the data is +processed in the environment. diff --git a/com.unity.ml-agents/Documentation~/Python-LLAPI.md b/com.unity.ml-agents/Documentation~/Python-LLAPI.md new file mode 100644 index 0000000000..ace4a9a15d --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Python-LLAPI.md @@ -0,0 +1,354 @@ +# Unity ML-Agents Python Low Level API + +The `mlagents` Python package contains two components: a low level API which +allows you to interact directly with a Unity Environment (`mlagents_envs`) and +an entry point to train (`mlagents-learn`) which allows you to train agents in +Unity Environments using our implementations of reinforcement learning or +imitation learning. This document describes how to use the `mlagents_envs` API. +For information on using `mlagents-learn`, see [here](Training-ML-Agents.md). +For Python Low Level API documentation, see [here](Python-LLAPI-Documentation.md). + +The Python Low Level API can be used to interact directly with your Unity +learning environment. As such, it can serve as the basis for developing and +evaluating new learning algorithms. + +## mlagents_envs + +The ML-Agents Toolkit Low Level API is a Python API for controlling the +simulation loop of an environment or game built with Unity. This API is used by +the training algorithms inside the ML-Agent Toolkit, but you can also write your +own Python programs using this API. + +The key objects in the Python API include: + +- **UnityEnvironment** — the main interface between the Unity application and + your code. Use UnityEnvironment to start and control a simulation or training + session. +- **BehaviorName** - is a string that identifies a behavior in the simulation. +- **AgentId** - is an `int` that serves as unique identifier for Agents in the + simulation. +- **DecisionSteps** — contains the data from Agents belonging to the same + "Behavior" in the simulation, such as observations and rewards. Only Agents + that requested a decision since the last call to `env.step()` are in the + DecisionSteps object. +- **TerminalSteps** — contains the data from Agents belonging to the same + "Behavior" in the simulation, such as observations and rewards. Only Agents + whose episode ended since the last call to `env.step()` are in the + TerminalSteps object. +- **BehaviorSpec** — describes the shape of the observation data inside + DecisionSteps and TerminalSteps as well as the expected action shapes. + +These classes are all defined in the +[base_env](../ml-agents-envs/mlagents_envs/base_env.py) script. + +An Agent "Behavior" is a group of Agents identified by a `BehaviorName` that +share the same observations and action types (described in their +`BehaviorSpec`). You can think about Agent Behavior as a group of agents that +will share the same policy. All Agents with the same behavior have the same goal +and reward signals. + +To communicate with an Agent in a Unity environment from a Python program, the +Agent in the simulation must have `Behavior Parameters` set to communicate. You +must set the `Behavior Type` to `Default` and give it a `Behavior Name`. + +_Notice: Currently communication between Unity and Python takes place over an +open socket without authentication. As such, please make sure that the network +where training takes place is secure. This will be addressed in a future +release._ + +## Loading a Unity Environment + +Python-side communication happens through `UnityEnvironment` which is located in +[`environment.py`](../ml-agents-envs/mlagents_envs/environment.py). To load a +Unity environment from a built binary file, put the file in the same directory +as `envs`. For example, if the filename of your Unity environment is `3DBall`, +in python, run: + +```python +from mlagents_envs.environment import UnityEnvironment +# This is a non-blocking call that only loads the environment. +env = UnityEnvironment(file_name="3DBall", seed=1, side_channels=[]) +# Start interacting with the environment. +env.reset() +behavior_names = env.behavior_specs.keys() +... +``` +**NOTE:** Please read [Interacting with a Unity Environment](#interacting-with-a-unity-environment) +to read more about how you can interact with the Unity environment from Python. + +- `file_name` is the name of the environment binary (located in the root + directory of the python project). +- `worker_id` indicates which port to use for communication with the + environment. For use in parallel training regimes such as A3C. +- `seed` indicates the seed to use when generating random numbers during the + training process. In environments which are stochastic, setting the seed + enables reproducible experimentation by ensuring that the environment and + trainers utilize the same random seed. +- `side_channels` provides a way to exchange data with the Unity simulation that + is not related to the reinforcement learning loop. For example: configurations + or properties. More on them in the [Side Channels](Custom-SideChannels.md) doc. + +If you want to directly interact with the Editor, you need to use +`file_name=None`, then press the **Play** button in the Editor when the message +_"Start training by pressing the Play button in the Unity Editor"_ is displayed +on the screen + +### Interacting with a Unity Environment + +#### The BaseEnv interface + +A `BaseEnv` has the following methods: + +- **Reset : `env.reset()`** Sends a signal to reset the environment. Returns + None. +- **Step : `env.step()`** Sends a signal to step the environment. Returns None. + Note that a "step" for Python does not correspond to either Unity `Update` nor + `FixedUpdate`. When `step()` or `reset()` is called, the Unity simulation will + move forward until an Agent in the simulation needs a input from Python to + act. +- **Close : `env.close()`** Sends a shutdown signal to the environment and + terminates the communication. +- **Behavior Specs : `env.behavior_specs`** Returns a Mapping of + `BehaviorName` to `BehaviorSpec` objects (read only). + A `BehaviorSpec` contains the observation shapes and the + `ActionSpec` (which defines the action shape). Note that + the `BehaviorSpec` for a specific group is fixed throughout the simulation. + The number of entries in the Mapping can change over time in the simulation + if new Agent behaviors are created in the simulation. +- **Get Steps : `env.get_steps(behavior_name: str)`** Returns a tuple + `DecisionSteps, TerminalSteps` corresponding to the behavior_name given as + input. The `DecisionSteps` contains information about the state of the agents + **that need an action this step** and have the behavior behavior_name. The + `TerminalSteps` contains information about the state of the agents **whose + episode ended** and have the behavior behavior_name. Both `DecisionSteps` and + `TerminalSteps` contain information such as the observations, the rewards and + the agent identifiers. `DecisionSteps` also contains action masks for the next + action while `TerminalSteps` contains the reason for termination (did the + Agent reach its maximum step and was interrupted). The data is in `np.array` + of which the first dimension is always the number of agents note that the + number of agents is not guaranteed to remain constant during the simulation + and it is not unusual to have either `DecisionSteps` or `TerminalSteps` + contain no Agents at all. +- **Set Actions :`env.set_actions(behavior_name: str, action: ActionTuple)`** Sets + the actions for a whole agent group. `action` is an `ActionTuple`, which + is made up of a 2D `np.array` of `dtype=np.int32` for discrete actions, and + `dtype=np.float32` for continuous actions. The first dimension of `np.array` + in the tuple is the number of agents that requested a decision since the + last call to `env.step()`. The second dimension is the number of discrete or + continuous actions for the corresponding array. +- **Set Action for Agent : + `env.set_action_for_agent(agent_group: str, agent_id: int, action: ActionTuple)`** + Sets the action for a specific Agent in an agent group. `agent_group` is the + name of the group the Agent belongs to and `agent_id` is the integer + identifier of the Agent. `action` is an `ActionTuple` as described above. +**Note:** If no action is provided for an agent group between two calls to +`env.step()` then the default action will be all zeros. + +#### DecisionSteps and DecisionStep + +`DecisionSteps` (with `s`) contains information about a whole batch of Agents +while `DecisionStep` (no `s`) only contains information about a single Agent. + +A `DecisionSteps` has the following fields : + +- `obs` is a list of numpy arrays observations collected by the group of agent. + The first dimension of the array corresponds to the batch size of the group + (number of agents requesting a decision since the last call to `env.step()`). +- `reward` is a float vector of length batch size. Corresponds to the rewards + collected by each agent since the last simulation step. +- `agent_id` is an int vector of length batch size containing unique identifier + for the corresponding Agent. This is used to track Agents across simulation + steps. +- `action_mask` is an optional list of two dimensional arrays of booleans which is only + available when using multi-discrete actions. Each array corresponds to an + action branch. The first dimension of each array is the batch size and the + second contains a mask for each action of the branch. If true, the action is + not available for the agent during this simulation step. + +It also has the two following methods: + +- `len(DecisionSteps)` Returns the number of agents requesting a decision since + the last call to `env.step()`. +- `DecisionSteps[agent_id]` Returns a `DecisionStep` for the Agent with the + `agent_id` unique identifier. + +A `DecisionStep` has the following fields: + +- `obs` is a list of numpy arrays observations collected by the agent. (Each + array has one less dimension than the arrays in `DecisionSteps`) +- `reward` is a float. Corresponds to the rewards collected by the agent since + the last simulation step. +- `agent_id` is an int and an unique identifier for the corresponding Agent. +- `action_mask` is an optional list of one dimensional arrays of booleans which is only + available when using multi-discrete actions. Each array corresponds to an + action branch. Each array contains a mask for each action of the branch. If + true, the action is not available for the agent during this simulation step. + +#### TerminalSteps and TerminalStep + +Similarly to `DecisionSteps` and `DecisionStep`, `TerminalSteps` (with `s`) +contains information about a whole batch of Agents while `TerminalStep` (no `s`) +only contains information about a single Agent. + +A `TerminalSteps` has the following fields : + +- `obs` is a list of numpy arrays observations collected by the group of agent. + The first dimension of the array corresponds to the batch size of the group + (number of agents requesting a decision since the last call to `env.step()`). +- `reward` is a float vector of length batch size. Corresponds to the rewards + collected by each agent since the last simulation step. +- `agent_id` is an int vector of length batch size containing unique identifier + for the corresponding Agent. This is used to track Agents across simulation + steps. + - `interrupted` is an array of booleans of length batch size. Is true if the + associated Agent was interrupted since the last decision step. For example, + if the Agent reached the maximum number of steps for the episode. + +It also has the two following methods: + +- `len(TerminalSteps)` Returns the number of agents requesting a decision since + the last call to `env.step()`. +- `TerminalSteps[agent_id]` Returns a `TerminalStep` for the Agent with the + `agent_id` unique identifier. + +A `TerminalStep` has the following fields: + +- `obs` is a list of numpy arrays observations collected by the agent. (Each + array has one less dimension than the arrays in `TerminalSteps`) +- `reward` is a float. Corresponds to the rewards collected by the agent since + the last simulation step. +- `agent_id` is an int and an unique identifier for the corresponding Agent. + - `interrupted` is a bool. Is true if the Agent was interrupted since the last + decision step. For example, if the Agent reached the maximum number of steps for + the episode. + +#### BehaviorSpec + +A `BehaviorSpec` has the following fields : + +- `observation_specs` is a List of `ObservationSpec` objects : Each `ObservationSpec` + corresponds to an observation's properties: `shape` is a tuple of ints that + corresponds to the shape of the observation (without the number of agents dimension). + `dimension_property` is a tuple of flags containing extra information about how the + data should be processed in the corresponding dimension. `observation_type` is an enum + corresponding to what type of observation is generating the data (i.e., default, goal, + etc). Note that the `ObservationSpec` have the same ordering as the ordering of observations + in the DecisionSteps, DecisionStep, TerminalSteps and TerminalStep. +- `action_spec` is an `ActionSpec` namedtuple that defines the number and types + of actions for the Agent. + +An `ActionSpec` has the following fields and properties: +- `continuous_size` is the number of floats that constitute the continuous actions. +- `discrete_size` is the number of branches (the number of independent actions) that + constitute the multi-discrete actions. +- `discrete_branches` is a Tuple of ints. Each int corresponds to the number of + different options for each branch of the action. For example: + In a game direction input (no movement, left, right) and + jump input (no jump, jump) there will be two branches (direction and jump), + the first one with 3 options and the second with 2 options. (`discrete_size = 2` + and `discrete_action_branches = (3,2,)`) + + +### Communicating additional information with the Environment + +In addition to the means of communicating between Unity and python described +above, we also provide methods for sharing agent-agnostic information. These +additional methods are referred to as side channels. ML-Agents includes two +ready-made side channels, described below. It is also possible to create custom +side channels to communicate any additional data between a Unity environment and +Python. Instructions for creating custom side channels can be found +[here](Custom-SideChannels.md). + +Side channels exist as separate classes which are instantiated, and then passed +as list to the `side_channels` argument of the constructor of the +`UnityEnvironment` class. + +```python +channel = MyChannel() + +env = UnityEnvironment(side_channels = [channel]) +``` + +**Note** : A side channel will only send/receive messages when `env.step` or +`env.reset()` is called. + +#### EngineConfigurationChannel + +The `EngineConfiguration` side channel allows you to modify the time-scale, +resolution, and graphics quality of the environment. This can be useful for +adjusting the environment to perform better during training, or be more +interpretable during inference. + +`EngineConfigurationChannel` has two methods : + +- `set_configuration_parameters` which takes the following arguments: + - `width`: Defines the width of the display. (Must be set alongside height) + - `height`: Defines the height of the display. (Must be set alongside width) + - `quality_level`: Defines the quality level of the simulation. + - `time_scale`: Defines the multiplier for the deltatime in the simulation. If + set to a higher value, time will pass faster in the simulation but the + physics may perform unpredictably. + - `target_frame_rate`: Instructs simulation to try to render at a specified + frame rate. + - `capture_frame_rate` Instructs the simulation to consider time between + updates to always be constant, regardless of the actual frame rate. +- `set_configuration` with argument config which is an `EngineConfig` NamedTuple + object. + +For example, the following code would adjust the time-scale of the simulation to +be 2x realtime. + +```python +from mlagents_envs.environment import UnityEnvironment +from mlagents_envs.side_channel.engine_configuration_channel import EngineConfigurationChannel + +channel = EngineConfigurationChannel() + +env = UnityEnvironment(side_channels=[channel]) + +channel.set_configuration_parameters(time_scale = 2.0) + +i = env.reset() +... +``` + +#### EnvironmentParameters + +The `EnvironmentParameters` will allow you to get and set pre-defined numerical +values in the environment. This can be useful for adjusting environment-specific +settings, or for reading non-agent related information from the environment. You +can call `get_property` and `set_property` on the side channel to read and write +properties. + +`EnvironmentParametersChannel` has one methods: + +- `set_float_parameter` Sets a float parameter in the Unity Environment. + - key: The string identifier of the property. + - value: The float value of the property. + +```python +from mlagents_envs.environment import UnityEnvironment +from mlagents_envs.side_channel.environment_parameters_channel import EnvironmentParametersChannel + +channel = EnvironmentParametersChannel() + +env = UnityEnvironment(side_channels=[channel]) + +channel.set_float_parameter("parameter_1", 2.0) + +i = env.reset() +... +``` + +Once a property has been modified in Python, you can access it in C# after the +next call to `step` as follows: + +```csharp +var envParameters = Academy.Instance.EnvironmentParameters; +float property1 = envParameters.GetWithDefault("parameter_1", 0.0f); +``` + +#### Custom side channels + +For information on how to make custom side channels for sending additional data +types, see the documentation [here](Custom-SideChannels.md). diff --git a/com.unity.ml-agents/Documentation~/Python-On-Off-Policy-Trainer-Documentation.md b/com.unity.ml-agents/Documentation~/Python-On-Off-Policy-Trainer-Documentation.md new file mode 100644 index 0000000000..e2fc7770c7 --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Python-On-Off-Policy-Trainer-Documentation.md @@ -0,0 +1,787 @@ +# Table of Contents + +* [mlagents.trainers.trainer.on\_policy\_trainer](#mlagents.trainers.trainer.on_policy_trainer) + * [OnPolicyTrainer](#mlagents.trainers.trainer.on_policy_trainer.OnPolicyTrainer) + * [\_\_init\_\_](#mlagents.trainers.trainer.on_policy_trainer.OnPolicyTrainer.__init__) + * [add\_policy](#mlagents.trainers.trainer.on_policy_trainer.OnPolicyTrainer.add_policy) +* [mlagents.trainers.trainer.off\_policy\_trainer](#mlagents.trainers.trainer.off_policy_trainer) + * [OffPolicyTrainer](#mlagents.trainers.trainer.off_policy_trainer.OffPolicyTrainer) + * [\_\_init\_\_](#mlagents.trainers.trainer.off_policy_trainer.OffPolicyTrainer.__init__) + * [save\_model](#mlagents.trainers.trainer.off_policy_trainer.OffPolicyTrainer.save_model) + * [save\_replay\_buffer](#mlagents.trainers.trainer.off_policy_trainer.OffPolicyTrainer.save_replay_buffer) + * [load\_replay\_buffer](#mlagents.trainers.trainer.off_policy_trainer.OffPolicyTrainer.load_replay_buffer) + * [add\_policy](#mlagents.trainers.trainer.off_policy_trainer.OffPolicyTrainer.add_policy) +* [mlagents.trainers.trainer.rl\_trainer](#mlagents.trainers.trainer.rl_trainer) + * [RLTrainer](#mlagents.trainers.trainer.rl_trainer.RLTrainer) + * [end\_episode](#mlagents.trainers.trainer.rl_trainer.RLTrainer.end_episode) + * [create\_optimizer](#mlagents.trainers.trainer.rl_trainer.RLTrainer.create_optimizer) + * [save\_model](#mlagents.trainers.trainer.rl_trainer.RLTrainer.save_model) + * [advance](#mlagents.trainers.trainer.rl_trainer.RLTrainer.advance) +* [mlagents.trainers.trainer.trainer](#mlagents.trainers.trainer.trainer) + * [Trainer](#mlagents.trainers.trainer.trainer.Trainer) + * [\_\_init\_\_](#mlagents.trainers.trainer.trainer.Trainer.__init__) + * [stats\_reporter](#mlagents.trainers.trainer.trainer.Trainer.stats_reporter) + * [parameters](#mlagents.trainers.trainer.trainer.Trainer.parameters) + * [get\_max\_steps](#mlagents.trainers.trainer.trainer.Trainer.get_max_steps) + * [get\_step](#mlagents.trainers.trainer.trainer.Trainer.get_step) + * [threaded](#mlagents.trainers.trainer.trainer.Trainer.threaded) + * [should\_still\_train](#mlagents.trainers.trainer.trainer.Trainer.should_still_train) + * [reward\_buffer](#mlagents.trainers.trainer.trainer.Trainer.reward_buffer) + * [save\_model](#mlagents.trainers.trainer.trainer.Trainer.save_model) + * [end\_episode](#mlagents.trainers.trainer.trainer.Trainer.end_episode) + * [create\_policy](#mlagents.trainers.trainer.trainer.Trainer.create_policy) + * [add\_policy](#mlagents.trainers.trainer.trainer.Trainer.add_policy) + * [get\_policy](#mlagents.trainers.trainer.trainer.Trainer.get_policy) + * [advance](#mlagents.trainers.trainer.trainer.Trainer.advance) + * [publish\_policy\_queue](#mlagents.trainers.trainer.trainer.Trainer.publish_policy_queue) + * [subscribe\_trajectory\_queue](#mlagents.trainers.trainer.trainer.Trainer.subscribe_trajectory_queue) +* [mlagents.trainers.settings](#mlagents.trainers.settings) + * [deep\_update\_dict](#mlagents.trainers.settings.deep_update_dict) + * [RewardSignalSettings](#mlagents.trainers.settings.RewardSignalSettings) + * [structure](#mlagents.trainers.settings.RewardSignalSettings.structure) + * [ParameterRandomizationSettings](#mlagents.trainers.settings.ParameterRandomizationSettings) + * [\_\_str\_\_](#mlagents.trainers.settings.ParameterRandomizationSettings.__str__) + * [structure](#mlagents.trainers.settings.ParameterRandomizationSettings.structure) + * [unstructure](#mlagents.trainers.settings.ParameterRandomizationSettings.unstructure) + * [apply](#mlagents.trainers.settings.ParameterRandomizationSettings.apply) + * [ConstantSettings](#mlagents.trainers.settings.ConstantSettings) + * [\_\_str\_\_](#mlagents.trainers.settings.ConstantSettings.__str__) + * [apply](#mlagents.trainers.settings.ConstantSettings.apply) + * [UniformSettings](#mlagents.trainers.settings.UniformSettings) + * [\_\_str\_\_](#mlagents.trainers.settings.UniformSettings.__str__) + * [apply](#mlagents.trainers.settings.UniformSettings.apply) + * [GaussianSettings](#mlagents.trainers.settings.GaussianSettings) + * [\_\_str\_\_](#mlagents.trainers.settings.GaussianSettings.__str__) + * [apply](#mlagents.trainers.settings.GaussianSettings.apply) + * [MultiRangeUniformSettings](#mlagents.trainers.settings.MultiRangeUniformSettings) + * [\_\_str\_\_](#mlagents.trainers.settings.MultiRangeUniformSettings.__str__) + * [apply](#mlagents.trainers.settings.MultiRangeUniformSettings.apply) + * [CompletionCriteriaSettings](#mlagents.trainers.settings.CompletionCriteriaSettings) + * [need\_increment](#mlagents.trainers.settings.CompletionCriteriaSettings.need_increment) + * [Lesson](#mlagents.trainers.settings.Lesson) + * [EnvironmentParameterSettings](#mlagents.trainers.settings.EnvironmentParameterSettings) + * [structure](#mlagents.trainers.settings.EnvironmentParameterSettings.structure) + * [TrainerSettings](#mlagents.trainers.settings.TrainerSettings) + * [structure](#mlagents.trainers.settings.TrainerSettings.structure) + * [CheckpointSettings](#mlagents.trainers.settings.CheckpointSettings) + * [prioritize\_resume\_init](#mlagents.trainers.settings.CheckpointSettings.prioritize_resume_init) + * [RunOptions](#mlagents.trainers.settings.RunOptions) + * [from\_argparse](#mlagents.trainers.settings.RunOptions.from_argparse) + + +# mlagents.trainers.trainer.on\_policy\_trainer + + +## OnPolicyTrainer Objects + +```python +class OnPolicyTrainer(RLTrainer) +``` + +The PPOTrainer is an implementation of the PPO algorithm. + + +#### \_\_init\_\_ + +```python + | __init__(behavior_name: str, reward_buff_cap: int, trainer_settings: TrainerSettings, training: bool, load: bool, seed: int, artifact_path: str) +``` + +Responsible for collecting experiences and training an on-policy model. + +**Arguments**: + +- `behavior_name`: The name of the behavior associated with trainer config +- `reward_buff_cap`: Max reward history to track in the reward buffer +- `trainer_settings`: The parameters for the trainer. +- `training`: Whether the trainer is set for training. +- `load`: Whether the model should be loaded. +- `seed`: The seed the model will be initialized with +- `artifact_path`: The directory within which to store artifacts from this trainer. + + +#### add\_policy + +```python + | add_policy(parsed_behavior_id: BehaviorIdentifiers, policy: Policy) -> None +``` + +Adds policy to trainer. + +**Arguments**: + +- `parsed_behavior_id`: Behavior identifiers that the policy should belong to. +- `policy`: Policy to associate with name_behavior_id. + + +# mlagents.trainers.trainer.off\_policy\_trainer + + +## OffPolicyTrainer Objects + +```python +class OffPolicyTrainer(RLTrainer) +``` + +The SACTrainer is an implementation of the SAC algorithm, with support +for discrete actions and recurrent networks. + + +#### \_\_init\_\_ + +```python + | __init__(behavior_name: str, reward_buff_cap: int, trainer_settings: TrainerSettings, training: bool, load: bool, seed: int, artifact_path: str) +``` + +Responsible for collecting experiences and training an off-policy model. + +**Arguments**: + +- `behavior_name`: The name of the behavior associated with trainer config +- `reward_buff_cap`: Max reward history to track in the reward buffer +- `trainer_settings`: The parameters for the trainer. +- `training`: Whether the trainer is set for training. +- `load`: Whether the model should be loaded. +- `seed`: The seed the model will be initialized with +- `artifact_path`: The directory within which to store artifacts from this trainer. + + +#### save\_model + +```python + | save_model() -> None +``` + +Saves the final training model to memory +Overrides the default to save the replay buffer. + + +#### save\_replay\_buffer + +```python + | save_replay_buffer() -> None +``` + +Save the training buffer's update buffer to a pickle file. + + +#### load\_replay\_buffer + +```python + | load_replay_buffer() -> None +``` + +Loads the last saved replay buffer from a file. + + +#### add\_policy + +```python + | add_policy(parsed_behavior_id: BehaviorIdentifiers, policy: Policy) -> None +``` + +Adds policy to trainer. + + +# mlagents.trainers.trainer.rl\_trainer + + +## RLTrainer Objects + +```python +class RLTrainer(Trainer) +``` + +This class is the base class for trainers that use Reward Signals. + + +#### end\_episode + +```python + | end_episode() -> None +``` + +A signal that the Episode has ended. The buffer must be reset. +Get only called when the academy resets. + + +#### create\_optimizer + +```python + | @abc.abstractmethod + | create_optimizer() -> TorchOptimizer +``` + +Creates an Optimizer object + + +#### save\_model + +```python + | save_model() -> None +``` + +Saves the policy associated with this trainer. + + +#### advance + +```python + | advance() -> None +``` + +Steps the trainer, taking in trajectories and updates if ready. +Will block and wait briefly if there are no trajectories. + + +# mlagents.trainers.trainer.trainer + + +## Trainer Objects + +```python +class Trainer(abc.ABC) +``` + +This class is the base class for the mlagents_envs.trainers + + +#### \_\_init\_\_ + +```python + | __init__(brain_name: str, trainer_settings: TrainerSettings, training: bool, load: bool, artifact_path: str, reward_buff_cap: int = 1) +``` + +Responsible for collecting experiences and training a neural network model. + +**Arguments**: + +- `brain_name`: Brain name of brain to be trained. +- `trainer_settings`: The parameters for the trainer (dictionary). +- `training`: Whether the trainer is set for training. +- `artifact_path`: The directory within which to store artifacts from this trainer +- `reward_buff_cap`: + + +#### stats\_reporter + +```python + | @property + | stats_reporter() +``` + +Returns the stats reporter associated with this Trainer. + + +#### parameters + +```python + | @property + | parameters() -> TrainerSettings +``` + +Returns the trainer parameters of the trainer. + + +#### get\_max\_steps + +```python + | @property + | get_max_steps() -> int +``` + +Returns the maximum number of steps. Is used to know when the trainer should be stopped. + +**Returns**: + +The maximum number of steps of the trainer + + +#### get\_step + +```python + | @property + | get_step() -> int +``` + +Returns the number of steps the trainer has performed + +**Returns**: + +the step count of the trainer + + +#### threaded + +```python + | @property + | threaded() -> bool +``` + +Whether or not to run the trainer in a thread. True allows the trainer to +update the policy while the environment is taking steps. Set to False to +enforce strict on-policy updates (i.e. don't update the policy when taking steps.) + + +#### should\_still\_train + +```python + | @property + | should_still_train() -> bool +``` + +Returns whether or not the trainer should train. A Trainer could +stop training if it wasn't training to begin with, or if max_steps +is reached. + + +#### reward\_buffer + +```python + | @property + | reward_buffer() -> Deque[float] +``` + +Returns the reward buffer. The reward buffer contains the cumulative +rewards of the most recent episodes completed by agents using this +trainer. + +**Returns**: + +the reward buffer. + + +#### save\_model + +```python + | @abc.abstractmethod + | save_model() -> None +``` + +Saves model file(s) for the policy or policies associated with this trainer. + + +#### end\_episode + +```python + | @abc.abstractmethod + | end_episode() +``` + +A signal that the Episode has ended. The buffer must be reset. +Get only called when the academy resets. + + +#### create\_policy + +```python + | @abc.abstractmethod + | create_policy(parsed_behavior_id: BehaviorIdentifiers, behavior_spec: BehaviorSpec) -> Policy +``` + +Creates a Policy object + + +#### add\_policy + +```python + | @abc.abstractmethod + | add_policy(parsed_behavior_id: BehaviorIdentifiers, policy: Policy) -> None +``` + +Adds policy to trainer. + + +#### get\_policy + +```python + | get_policy(name_behavior_id: str) -> Policy +``` + +Gets policy associated with name_behavior_id + +**Arguments**: + +- `name_behavior_id`: Fully qualified behavior name + +**Returns**: + +Policy associated with name_behavior_id + + +#### advance + +```python + | @abc.abstractmethod + | advance() -> None +``` + +Advances the trainer. Typically, this means grabbing trajectories +from all subscribed trajectory queues (self.trajectory_queues), and updating +a policy using the steps in them, and if needed pushing a new policy onto the right +policy queues (self.policy_queues). + + +#### publish\_policy\_queue + +```python + | publish_policy_queue(policy_queue: AgentManagerQueue[Policy]) -> None +``` + +Adds a policy queue to the list of queues to publish to when this Trainer +makes a policy update + +**Arguments**: + +- `policy_queue`: Policy queue to publish to. + + +#### subscribe\_trajectory\_queue + +```python + | subscribe_trajectory_queue(trajectory_queue: AgentManagerQueue[Trajectory]) -> None +``` + +Adds a trajectory queue to the list of queues for the trainer to ingest Trajectories from. + +**Arguments**: + +- `trajectory_queue`: Trajectory queue to read from. + + +# mlagents.trainers.settings + + +#### deep\_update\_dict + +```python +deep_update_dict(d: Dict, update_d: Mapping) -> None +``` + +Similar to dict.update(), but works for nested dicts of dicts as well. + + +## RewardSignalSettings Objects + +```python +@attr.s(auto_attribs=True) +class RewardSignalSettings() +``` + + +#### structure + +```python + | @staticmethod + | structure(d: Mapping, t: type) -> Any +``` + +Helper method to structure a Dict of RewardSignalSettings class. Meant to be registered with +cattr.register_structure_hook() and called with cattr.structure(). This is needed to handle +the special Enum selection of RewardSignalSettings classes. + + +## ParameterRandomizationSettings Objects + +```python +@attr.s(auto_attribs=True) +class ParameterRandomizationSettings(abc.ABC) +``` + + +#### \_\_str\_\_ + +```python + | __str__() -> str +``` + +Helper method to output sampler stats to console. + + +#### structure + +```python + | @staticmethod + | structure(d: Union[Mapping, float], t: type) -> "ParameterRandomizationSettings" +``` + +Helper method to a ParameterRandomizationSettings class. Meant to be registered with +cattr.register_structure_hook() and called with cattr.structure(). This is needed to handle +the special Enum selection of ParameterRandomizationSettings classes. + + +#### unstructure + +```python + | @staticmethod + | unstructure(d: "ParameterRandomizationSettings") -> Mapping +``` + +Helper method to a ParameterRandomizationSettings class. Meant to be registered with +cattr.register_unstructure_hook() and called with cattr.unstructure(). + + +#### apply + +```python + | @abc.abstractmethod + | apply(key: str, env_channel: EnvironmentParametersChannel) -> None +``` + +Helper method to send sampler settings over EnvironmentParametersChannel +Calls the appropriate sampler type set method. + +**Arguments**: + +- `key`: environment parameter to be sampled +- `env_channel`: The EnvironmentParametersChannel to communicate sampler settings to environment + + +## ConstantSettings Objects + +```python +@attr.s(auto_attribs=True) +class ConstantSettings(ParameterRandomizationSettings) +``` + + +#### \_\_str\_\_ + +```python + | __str__() -> str +``` + +Helper method to output sampler stats to console. + + +#### apply + +```python + | apply(key: str, env_channel: EnvironmentParametersChannel) -> None +``` + +Helper method to send sampler settings over EnvironmentParametersChannel +Calls the constant sampler type set method. + +**Arguments**: + +- `key`: environment parameter to be sampled +- `env_channel`: The EnvironmentParametersChannel to communicate sampler settings to environment + + +## UniformSettings Objects + +```python +@attr.s(auto_attribs=True) +class UniformSettings(ParameterRandomizationSettings) +``` + + +#### \_\_str\_\_ + +```python + | __str__() -> str +``` + +Helper method to output sampler stats to console. + + +#### apply + +```python + | apply(key: str, env_channel: EnvironmentParametersChannel) -> None +``` + +Helper method to send sampler settings over EnvironmentParametersChannel +Calls the uniform sampler type set method. + +**Arguments**: + +- `key`: environment parameter to be sampled +- `env_channel`: The EnvironmentParametersChannel to communicate sampler settings to environment + + +## GaussianSettings Objects + +```python +@attr.s(auto_attribs=True) +class GaussianSettings(ParameterRandomizationSettings) +``` + + +#### \_\_str\_\_ + +```python + | __str__() -> str +``` + +Helper method to output sampler stats to console. + + +#### apply + +```python + | apply(key: str, env_channel: EnvironmentParametersChannel) -> None +``` + +Helper method to send sampler settings over EnvironmentParametersChannel +Calls the gaussian sampler type set method. + +**Arguments**: + +- `key`: environment parameter to be sampled +- `env_channel`: The EnvironmentParametersChannel to communicate sampler settings to environment + + +## MultiRangeUniformSettings Objects + +```python +@attr.s(auto_attribs=True) +class MultiRangeUniformSettings(ParameterRandomizationSettings) +``` + + +#### \_\_str\_\_ + +```python + | __str__() -> str +``` + +Helper method to output sampler stats to console. + + +#### apply + +```python + | apply(key: str, env_channel: EnvironmentParametersChannel) -> None +``` + +Helper method to send sampler settings over EnvironmentParametersChannel +Calls the multirangeuniform sampler type set method. + +**Arguments**: + +- `key`: environment parameter to be sampled +- `env_channel`: The EnvironmentParametersChannel to communicate sampler settings to environment + + +## CompletionCriteriaSettings Objects + +```python +@attr.s(auto_attribs=True) +class CompletionCriteriaSettings() +``` + +CompletionCriteriaSettings contains the information needed to figure out if the next +lesson must start. + + +#### need\_increment + +```python + | need_increment(progress: float, reward_buffer: List[float], smoothing: float) -> Tuple[bool, float] +``` + +Given measures, this method returns a boolean indicating if the lesson +needs to change now, and a float corresponding to the new smoothed value. + + +## Lesson Objects + +```python +@attr.s(auto_attribs=True) +class Lesson() +``` + +Gathers the data of one lesson for one environment parameter including its name, +the condition that must be fulfilled for the lesson to be completed and a sampler +for the environment parameter. If the completion_criteria is None, then this is +the last lesson in the curriculum. + + +## EnvironmentParameterSettings Objects + +```python +@attr.s(auto_attribs=True) +class EnvironmentParameterSettings() +``` + +EnvironmentParameterSettings is an ordered list of lessons for one environment +parameter. + + +#### structure + +```python + | @staticmethod + | structure(d: Mapping, t: type) -> Dict[str, "EnvironmentParameterSettings"] +``` + +Helper method to structure a Dict of EnvironmentParameterSettings class. Meant +to be registered with cattr.register_structure_hook() and called with +cattr.structure(). + + +## TrainerSettings Objects + +```python +@attr.s(auto_attribs=True) +class TrainerSettings(ExportableSettings) +``` + + +#### structure + +```python + | @staticmethod + | structure(d: Mapping, t: type) -> Any +``` + +Helper method to structure a TrainerSettings class. Meant to be registered with +cattr.register_structure_hook() and called with cattr.structure(). + + +## CheckpointSettings Objects + +```python +@attr.s(auto_attribs=True) +class CheckpointSettings() +``` + + +#### prioritize\_resume\_init + +```python + | prioritize_resume_init() -> None +``` + +Prioritize explicit command line resume/init over conflicting yaml options. +if both resume/init are set at one place use resume + + +## RunOptions Objects + +```python +@attr.s(auto_attribs=True) +class RunOptions(ExportableSettings) +``` + + +#### from\_argparse + +```python + | @staticmethod + | from_argparse(args: argparse.Namespace) -> "RunOptions" +``` + +Takes an argparse.Namespace as specified in `parse_command_line`, loads input configuration files +from file paths, and converts to a RunOptions instance. + +**Arguments**: + +- `args`: collection of command-line parameters passed to mlagents-learn + +**Returns**: + +RunOptions representing the passed in arguments, with trainer config, curriculum and sampler +configs loaded from files. diff --git a/com.unity.ml-agents/Documentation~/Python-Optimizer-Documentation.md b/com.unity.ml-agents/Documentation~/Python-Optimizer-Documentation.md new file mode 100644 index 0000000000..7cdfaec832 --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Python-Optimizer-Documentation.md @@ -0,0 +1,87 @@ +# Table of Contents + +* [mlagents.trainers.optimizer.torch\_optimizer](#mlagents.trainers.optimizer.torch_optimizer) + * [TorchOptimizer](#mlagents.trainers.optimizer.torch_optimizer.TorchOptimizer) + * [create\_reward\_signals](#mlagents.trainers.optimizer.torch_optimizer.TorchOptimizer.create_reward_signals) + * [get\_trajectory\_value\_estimates](#mlagents.trainers.optimizer.torch_optimizer.TorchOptimizer.get_trajectory_value_estimates) +* [mlagents.trainers.optimizer.optimizer](#mlagents.trainers.optimizer.optimizer) + * [Optimizer](#mlagents.trainers.optimizer.optimizer.Optimizer) + * [update](#mlagents.trainers.optimizer.optimizer.Optimizer.update) + + +# mlagents.trainers.optimizer.torch\_optimizer + + +## TorchOptimizer Objects + +```python +class TorchOptimizer(Optimizer) +``` + + +#### create\_reward\_signals + +```python + | create_reward_signals(reward_signal_configs: Dict[RewardSignalType, RewardSignalSettings]) -> None +``` + +Create reward signals + +**Arguments**: + +- `reward_signal_configs`: Reward signal config. + + +#### get\_trajectory\_value\_estimates + +```python + | get_trajectory_value_estimates(batch: AgentBuffer, next_obs: List[np.ndarray], done: bool, agent_id: str = "") -> Tuple[Dict[str, np.ndarray], Dict[str, float], Optional[AgentBufferField]] +``` + +Get value estimates and memories for a trajectory, in batch form. + +**Arguments**: + +- `batch`: An AgentBuffer that consists of a trajectory. +- `next_obs`: the next observation (after the trajectory). Used for bootstrapping + if this is not a terminal trajectory. +- `done`: Set true if this is a terminal trajectory. +- `agent_id`: Agent ID of the agent that this trajectory belongs to. + +**Returns**: + +A Tuple of the Value Estimates as a Dict of [name, np.ndarray(trajectory_len)], + the final value estimate as a Dict of [name, float], and optionally (if using memories) + an AgentBufferField of initial critic memories to be used during update. + + +# mlagents.trainers.optimizer.optimizer + + +## Optimizer Objects + +```python +class Optimizer(abc.ABC) +``` + +Creates loss functions and auxillary networks (e.g. Q or Value) needed for training. +Provides methods to update the Policy. + + +#### update + +```python + | @abc.abstractmethod + | update(batch: AgentBuffer, num_sequences: int) -> Dict[str, float] +``` + +Update the Policy based on the batch that was passed in. + +**Arguments**: + +- `batch`: AgentBuffer that contains the minibatch of data used for this update. +- `num_sequences`: Number of recurrent sequences found in the minibatch. + +**Returns**: + +A Dict containing statistics (name, value) from the update (e.g. loss) diff --git a/com.unity.ml-agents/Documentation~/Python-PettingZoo-API-Documentation.md b/com.unity.ml-agents/Documentation~/Python-PettingZoo-API-Documentation.md new file mode 100644 index 0000000000..233e45e805 --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Python-PettingZoo-API-Documentation.md @@ -0,0 +1,246 @@ +# Table of Contents + +* [mlagents\_envs.envs.pettingzoo\_env\_factory](#mlagents_envs.envs.pettingzoo_env_factory) + * [PettingZooEnvFactory](#mlagents_envs.envs.pettingzoo_env_factory.PettingZooEnvFactory) + * [env](#mlagents_envs.envs.pettingzoo_env_factory.PettingZooEnvFactory.env) +* [mlagents\_envs.envs.unity\_aec\_env](#mlagents_envs.envs.unity_aec_env) + * [UnityAECEnv](#mlagents_envs.envs.unity_aec_env.UnityAECEnv) + * [\_\_init\_\_](#mlagents_envs.envs.unity_aec_env.UnityAECEnv.__init__) + * [step](#mlagents_envs.envs.unity_aec_env.UnityAECEnv.step) + * [observe](#mlagents_envs.envs.unity_aec_env.UnityAECEnv.observe) + * [last](#mlagents_envs.envs.unity_aec_env.UnityAECEnv.last) +* [mlagents\_envs.envs.unity\_parallel\_env](#mlagents_envs.envs.unity_parallel_env) + * [UnityParallelEnv](#mlagents_envs.envs.unity_parallel_env.UnityParallelEnv) + * [\_\_init\_\_](#mlagents_envs.envs.unity_parallel_env.UnityParallelEnv.__init__) + * [reset](#mlagents_envs.envs.unity_parallel_env.UnityParallelEnv.reset) +* [mlagents\_envs.envs.unity\_pettingzoo\_base\_env](#mlagents_envs.envs.unity_pettingzoo_base_env) + * [UnityPettingzooBaseEnv](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv) + * [observation\_spaces](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.observation_spaces) + * [observation\_space](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.observation_space) + * [action\_spaces](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.action_spaces) + * [action\_space](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.action_space) + * [side\_channel](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.side_channel) + * [reset](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.reset) + * [seed](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.seed) + * [render](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.render) + * [close](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.close) + + +# mlagents\_envs.envs.pettingzoo\_env\_factory + + +## PettingZooEnvFactory Objects + +```python +class PettingZooEnvFactory() +``` + + +#### env + +```python + | env(seed: Optional[int] = None, **kwargs: Union[List, int, bool, None]) -> UnityAECEnv +``` + +Creates the environment with env_id from unity's default_registry and wraps it in a UnityToPettingZooWrapper + +**Arguments**: + +- `seed`: The seed for the action spaces of the agents. +- `kwargs`: Any argument accepted by `UnityEnvironment`class except file_name + + +# mlagents\_envs.envs.unity\_aec\_env + + +## UnityAECEnv Objects + +```python +class UnityAECEnv(UnityPettingzooBaseEnv, AECEnv) +``` + +Unity AEC (PettingZoo) environment wrapper. + + +#### \_\_init\_\_ + +```python + | __init__(env: BaseEnv, seed: Optional[int] = None) +``` + +Initializes a Unity AEC environment wrapper. + +**Arguments**: + +- `env`: The UnityEnvironment that is being wrapped. +- `seed`: The seed for the action spaces of the agents. + + +#### step + +```python + | step(action: Any) -> None +``` + +Sets the action of the active agent and get the observation, reward, done +and info of the next agent. + +**Arguments**: + +- `action`: The action for the active agent + + +#### observe + +```python + | observe(agent_id) +``` + +Returns the observation an agent currently can make. `last()` calls this function. + + +#### last + +```python + | last(observe=True) +``` + +returns observation, cumulative reward, done, info for the current agent (specified by self.agent_selection) + + +# mlagents\_envs.envs.unity\_parallel\_env + + +## UnityParallelEnv Objects + +```python +class UnityParallelEnv(UnityPettingzooBaseEnv, ParallelEnv) +``` + +Unity Parallel (PettingZoo) environment wrapper. + + +#### \_\_init\_\_ + +```python + | __init__(env: BaseEnv, seed: Optional[int] = None) +``` + +Initializes a Unity Parallel environment wrapper. + +**Arguments**: + +- `env`: The UnityEnvironment that is being wrapped. +- `seed`: The seed for the action spaces of the agents. + + +#### reset + +```python + | reset() -> Dict[str, Any] +``` + +Resets the environment. + + +# mlagents\_envs.envs.unity\_pettingzoo\_base\_env + + +## UnityPettingzooBaseEnv Objects + +```python +class UnityPettingzooBaseEnv() +``` + +Unity Petting Zoo base environment. + + +#### observation\_spaces + +```python + | @property + | observation_spaces() -> Dict[str, spaces.Space] +``` + +Return the observation spaces of all the agents. + + +#### observation\_space + +```python + | observation_space(agent: str) -> Optional[spaces.Space] +``` + +The observation space of the current agent. + + +#### action\_spaces + +```python + | @property + | action_spaces() -> Dict[str, spaces.Space] +``` + +Return the action spaces of all the agents. + + +#### action\_space + +```python + | action_space(agent: str) -> Optional[spaces.Space] +``` + +The action space of the current agent. + + +#### side\_channel + +```python + | @property + | side_channel() -> Dict[str, Any] +``` + +The side channels of the environment. You can access the side channels +of an environment with `env.side_channel[]`. + + +#### reset + +```python + | reset() +``` + +Resets the environment. + + +#### seed + +```python + | seed(seed=None) +``` + +Reseeds the environment (making the resulting environment deterministic). +`reset()` must be called after `seed()`, and before `step()`. + + +#### render + +```python + | render(mode="human") +``` + +NOT SUPPORTED. + +Displays a rendered frame from the environment, if supported. +Alternate render modes in the default environments are `'rgb_array'` +which returns a numpy array and is supported by all environments outside of classic, +and `'ansi'` which returns the strings printed (specific to classic environments). + + +#### close + +```python + | close() -> None +``` + +Close the environment. diff --git a/com.unity.ml-agents/Documentation~/Python-PettingZoo-API.md b/com.unity.ml-agents/Documentation~/Python-PettingZoo-API.md new file mode 100644 index 0000000000..2c62ed8415 --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Python-PettingZoo-API.md @@ -0,0 +1,57 @@ +# Unity ML-Agents PettingZoo Wrapper + +With the increasing interest in multi-agent training with a gym-like API, we provide a +PettingZoo Wrapper around the [Petting Zoo API](https://pettingzoo.farama.org/). Our wrapper +provides interfaces on top of our `UnityEnvironment` class, which is the default way of +interfacing with a Unity environment via Python. + +## Installation and Examples + +The PettingZoo wrapper is part of the `mlagents_envs` package. Please refer to the +[mlagents_envs installation instructions](ML-Agents-Envs-README.md). + +[[Colab] PettingZoo Wrapper Example](https://colab.research.google.com/github/Unity-Technologies/ml-agents/blob/develop-python-api-ga/ml-agents-envs/colabs/Colab_PettingZoo.ipynb) + +This colab notebook demonstrates the example usage of the wrapper, including installation, +basic usages, and an example with our +[Striker vs Goalie environment](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Learning-Environment-Examples.md#strikers-vs-goalie) +which is a multi-agents environment with multiple different behavior names. + +## API interface + +This wrapper is compatible with PettingZoo API. Please check out +[PettingZoo API page](https://pettingzoo.farama.org/api/aec/) for more details. +Here's an example of interacting with wrapped environment: + +```python +from mlagents_envs.environment import UnityEnvironment +from mlagents_envs.envs import UnityToPettingZooWrapper + +unity_env = UnityEnvironment("StrikersVsGoalie") +env = UnityToPettingZooWrapper(unity_env) +env.reset() +for agent in env.agent_iter(): + observation, reward, done, info = env.last() + action = policy(observation, agent) + env.step(action) +``` + +## Notes +- There is support for both [AEC](https://pettingzoo.farama.org/api/aec/) + and [Parallel](https://pettingzoo.farama.org/api/parallel/) PettingZoo APIs. +- The AEC wrapper is compatible with PettingZoo (PZ) API interface but works in a slightly + different way under the hood. For the AEC API, Instead of stepping the environment in every `env.step(action)`, + the PZ wrapper will store the action, and will only perform environment stepping when all the + agents requesting for actions in the current step have been assigned an action. This is for + performance, considering that the communication between Unity and python is more efficient + when data are sent in batches. +- Since the actions for the AEC wrapper are stored without applying them to the environment until + all the actions are queued, some components of the API might behave in unexpected way. For example, a call + to `env.reward` should return the instantaneous reward for that particular step, but the true + reward would only be available when an actual environment step is performed. It's recommended that + you follow the API definition for training (access rewards from `env.last()` instead of + `env.reward`) and the underlying mechanism shouldn't affect training results. +- The environments will automatically reset when it's done, so `env.agent_iter(max_step)` will + keep going on until the specified max step is reached (default: `2**63`). There is no need to + call `env.reset()` except for the very beginning of instantiating an environment. + diff --git a/com.unity.ml-agents/Documentation~/Readme.md b/com.unity.ml-agents/Documentation~/Readme.md new file mode 100644 index 0000000000..aa6b4d8ade --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Readme.md @@ -0,0 +1,204 @@ +# Unity ML-Agents Toolkit + +[![docs badge](https://img.shields.io/badge/docs-reference-blue.svg)](https://github.com/Unity-Technologies/ml-agents/tree/release_22_docs/docs/) + +[![license badge](https://img.shields.io/badge/license-Apache--2.0-green.svg)](../LICENSE.md) + +([latest release](https://github.com/Unity-Technologies/ml-agents/releases/tag/latest_release)) +([all releases](https://github.com/Unity-Technologies/ml-agents/releases)) + +**The Unity Machine Learning Agents Toolkit** (ML-Agents) is an open-source +project that enables games and simulations to serve as environments for +training intelligent agents. We provide implementations (based on PyTorch) +of state-of-the-art algorithms to enable game developers and hobbyists to easily +train intelligent agents for 2D, 3D and VR/AR games. Researchers can also use the +provided simple-to-use Python API to train Agents using reinforcement learning, +imitation learning, neuroevolution, or any other methods. These trained agents can be +used for multiple purposes, including controlling NPC behavior (in a variety of +settings such as multi-agent and adversarial), automated testing of game builds +and evaluating different game design decisions pre-release. The ML-Agents +Toolkit is mutually beneficial for both game developers and AI researchers as it +provides a central platform where advances in AI can be evaluated on Unity’s +rich environments and then made accessible to the wider research and game +developer communities. + +## Features +- 17+ [example Unity environments](Learning-Environment-Examples.md) +- Support for multiple environment configurations and training scenarios +- Flexible Unity SDK that can be integrated into your game or custom Unity scene +- Support for training single-agent, multi-agent cooperative, and multi-agent + competitive scenarios via several Deep Reinforcement Learning algorithms (PPO, SAC, MA-POCA, self-play). +- Support for learning from demonstrations through two Imitation Learning algorithms (BC and GAIL). +- Quickly and easily add your own [custom training algorithm](Python-Custom-Trainer-Plugin.md) and/or components. +- Easily definable Curriculum Learning scenarios for complex tasks +- Train robust agents using environment randomization +- Flexible agent control with On Demand Decision Making +- Train using multiple concurrent Unity environment instances +- Utilizes the [Inference Engine](Inference-Engine.md) to + provide native cross-platform support +- Unity environment [control from Python](Python-LLAPI.md) +- Wrap Unity learning environments as a [gym](Python-Gym-API.md) environment +- Wrap Unity learning environments as a [PettingZoo](Python-PettingZoo-API.md) environment + +See our [ML-Agents Overview](ML-Agents-Overview.md) page for detailed +descriptions of all these features. Or go straight to our [web docs](https://unity-technologies.github.io/ml-agents/). +## Releases & Documentation + +**Our latest, stable release is `Release 22`. Click +[here](Getting-Started.md) +to get started with the latest release of ML-Agents.** + +**You can also check out our new [web docs](https://unity-technologies.github.io/ml-agents/)!** + +The table below lists all our releases, including our `main` branch which is +under active development and may be unstable. A few helpful guidelines: + +- The [Versioning page](Versioning.md) overviews how we manage our GitHub + releases and the versioning process for each of the ML-Agents components. +- The [Releases page](https://github.com/Unity-Technologies/ml-agents/releases) + contains details of the changes between releases. +- The [Migration page](Migrating.md) contains details on how to upgrade + from earlier releases of the ML-Agents Toolkit. +- The **Documentation** links in the table below include installation and usage + instructions specific to each release. Remember to always use the + documentation that corresponds to the release version you're using. +- The `com.unity.ml-agents` package is [verified](https://docs.unity3d.com/2020.1/Documentation/Manual/pack-safe.html) + for Unity 2020.1 and later. Verified packages releases are numbered 1.0.x. + +| **Version** | **Release Date** | **Source** | **Documentation** | **Download** | **Python Package** | **Unity Package** | +|:--------------------------:|:------:|:-------------:|:-------:|:------------:|:------------:|:------------:| +| **Release 22** | **October 5, 2024** | **[source](https://github.com/Unity-Technologies/ml-agents/tree/release_22)** | **[docs](https://unity-technologies.github.io/ml-agents/)** | **[download](https://github.com/Unity-Technologies/ml-agents/archive/release_22.zip)** | **[1.1.0](https://pypi.org/project/mlagents/1.1.0/)** | **[3.0.0](https://docs.unity3d.com/Packages/com.unity.ml-agents@3.0/manual/index.html)** | +| **develop (unstable)** | -- | [source](https://github.com/Unity-Technologies/ml-agents/tree/develop) | [docs](https://unity-technologies.github.io/ml-agents/) | [download](https://github.com/Unity-Technologies/ml-agents/archive/develop.zip) | -- | -- | + + + +If you are a researcher interested in a discussion of Unity as an AI platform, +see a pre-print of our +[reference paper on Unity and the ML-Agents Toolkit](https://arxiv.org/abs/1809.02627). + +If you use Unity or the ML-Agents Toolkit to conduct research, we ask that you +cite the following paper as a reference: + +``` +@article{juliani2020, + title={Unity: A general platform for intelligent agents}, + author={Juliani, Arthur and Berges, Vincent-Pierre and Teng, Ervin and Cohen, Andrew and Harper, Jonathan and Elion, Chris and Goy, Chris and Gao, Yuan and Henry, Hunter and Mattar, Marwan and Lange, Danny}, + journal={arXiv preprint arXiv:1809.02627}, + url={https://arxiv.org/pdf/1809.02627.pdf}, + year={2020} +} +``` + +Additionally, if you use the MA-POCA trainer in your research, we ask that you +cite the following paper as a reference: + +``` +@article{cohen2022, + title={On the Use and Misuse of Absorbing States in Multi-agent Reinforcement Learning}, + author={Cohen, Andrew and Teng, Ervin and Berges, Vincent-Pierre and Dong, Ruo-Ping and Henry, Hunter and Mattar, Marwan and Zook, Alexander and Ganguly, Sujoy}, + journal={RL in Games Workshop AAAI 2022}, + url={http://aaai-rlg.mlanctot.info/papers/AAAI22-RLG_paper_32.pdf}, + year={2022} +} +``` + + + +## Additional Resources + +We have a Unity Learn course, +[ML-Agents: Hummingbirds](https://learn.unity.com/course/ml-agents-hummingbirds), +that provides a gentle introduction to Unity and the ML-Agents Toolkit. + +We've also partnered with +[CodeMonkeyUnity](https://www.youtube.com/c/CodeMonkeyUnity) to create a +[series of tutorial videos](https://www.youtube.com/playlist?list=PLzDRvYVwl53vehwiN_odYJkPBzcqFw110) +on how to implement and use the ML-Agents Toolkit. + +We have also published a series of blog posts that are relevant for ML-Agents: + +- (July 12, 2021) + [ML-Agents plays Dodgeball](https://blog.unity.com/technology/ml-agents-plays-dodgeball) +- (May 5, 2021) + [ML-Agents v2.0 release: Now supports training complex cooperative behaviors](https://blogs.unity3d.com/2021/05/05/ml-agents-v2-0-release-now-supports-training-complex-cooperative-behaviors/) +- (December 28, 2020) + [Happy holidays from the Unity ML-Agents team!](https://blogs.unity3d.com/2020/12/28/happy-holidays-from-the-unity-ml-agents-team/) +- (November 20, 2020) + [How Eidos-Montréal created Grid Sensors to improve observations for training agents](https://blogs.unity3d.com/2020/11/20/how-eidos-montreal-created-grid-sensors-to-improve-observations-for-training-agents/) +- (November 11, 2020) + [2020 AI@Unity interns shoutout](https://blogs.unity3d.com/2020/11/11/2020-aiunity-interns-shoutout/) +- (May 12, 2020) + [Announcing ML-Agents Unity Package v1.0!](https://blogs.unity3d.com/2020/05/12/announcing-ml-agents-unity-package-v1-0/) +- (February 28, 2020) + [Training intelligent adversaries using self-play with ML-Agents](https://blogs.unity3d.com/2020/02/28/training-intelligent-adversaries-using-self-play-with-ml-agents/) +- (November 11, 2019) + [Training your agents 7 times faster with ML-Agents](https://blogs.unity3d.com/2019/11/11/training-your-agents-7-times-faster-with-ml-agents/) +- (October 21, 2019) + [The AI@Unity interns help shape the world](https://blogs.unity3d.com/2019/10/21/the-aiunity-interns-help-shape-the-world/) +- (April 15, 2019) + [Unity ML-Agents Toolkit v0.8: Faster training on real games](https://blogs.unity3d.com/2019/04/15/unity-ml-agents-toolkit-v0-8-faster-training-on-real-games/) +- (March 1, 2019) + [Unity ML-Agents Toolkit v0.7: A leap towards cross-platform inference](https://blogs.unity3d.com/2019/03/01/unity-ml-agents-toolkit-v0-7-a-leap-towards-cross-platform-inference/) +- (December 17, 2018) + [ML-Agents Toolkit v0.6: Improved usability of Brains and Imitation Learning](https://blogs.unity3d.com/2018/12/17/ml-agents-toolkit-v0-6-improved-usability-of-brains-and-imitation-learning/) +- (October 2, 2018) + [Puppo, The Corgi: Cuteness Overload with the Unity ML-Agents Toolkit](https://blogs.unity3d.com/2018/10/02/puppo-the-corgi-cuteness-overload-with-the-unity-ml-agents-toolkit/) +- (September 11, 2018) + [ML-Agents Toolkit v0.5, new resources for AI researchers available now](https://blogs.unity3d.com/2018/09/11/ml-agents-toolkit-v0-5-new-resources-for-ai-researchers-available-now/) +- (June 26, 2018) + [Solving sparse-reward tasks with Curiosity](https://blogs.unity3d.com/2018/06/26/solving-sparse-reward-tasks-with-curiosity/) +- (June 19, 2018) + [Unity ML-Agents Toolkit v0.4 and Udacity Deep Reinforcement Learning Nanodegree](https://blogs.unity3d.com/2018/06/19/unity-ml-agents-toolkit-v0-4-and-udacity-deep-reinforcement-learning-nanodegree/) +- (May 24, 2018) + [Imitation Learning in Unity: The Workflow](https://blogs.unity3d.com/2018/05/24/imitation-learning-in-unity-the-workflow/) +- (March 15, 2018) + [ML-Agents Toolkit v0.3 Beta released: Imitation Learning, feedback-driven features, and more](https://blogs.unity3d.com/2018/03/15/ml-agents-v0-3-beta-released-imitation-learning-feedback-driven-features-and-more/) +- (December 11, 2017) + [Using Machine Learning Agents in a real game: a beginner’s guide](https://blogs.unity3d.com/2017/12/11/using-machine-learning-agents-in-a-real-game-a-beginners-guide/) +- (December 8, 2017) + [Introducing ML-Agents Toolkit v0.2: Curriculum Learning, new environments, and more](https://blogs.unity3d.com/2017/12/08/introducing-ml-agents-v0-2-curriculum-learning-new-environments-and-more/) +- (September 19, 2017) + [Introducing: Unity Machine Learning Agents Toolkit](https://blogs.unity3d.com/2017/09/19/introducing-unity-machine-learning-agents/) +- Overviewing reinforcement learning concepts + ([multi-armed bandit](https://blogs.unity3d.com/2017/06/26/unity-ai-themed-blog-entries/) + and + [Q-learning](https://blogs.unity3d.com/2017/08/22/unity-ai-reinforcement-learning-with-q-learning/)) + +### More from Unity + +- [Unity Inference Engine](https://unity.com/products/sentis) +- [Introducing Unity Muse and Sentis](https://blog.unity.com/engine-platform/introducing-unity-muse-and-unity-sentis-ai) + +## Community and Feedback + +The ML-Agents Toolkit is an open-source project and we encourage and welcome +contributions. If you wish to contribute, be sure to review our +[contribution guidelines](CONTRIBUTING.md) and +[code of conduct](CODE_OF_CONDUCT.md). + +For problems with the installation and setup of the ML-Agents Toolkit, or +discussions about how to best setup or train your agents, please create a new +thread on the +[Unity ML-Agents forum](https://forum.unity.com/forums/ml-agents.453/) and make +sure to include as much detail as possible. If you run into any other problems +using the ML-Agents Toolkit or have a specific feature request, please +[submit a GitHub issue](https://github.com/Unity-Technologies/ml-agents/issues). + +Please tell us which samples you would like to see shipped with the ML-Agents Unity +package by replying to +[this forum thread](https://forum.unity.com/threads/feedback-wanted-shipping-sample-s-with-the-ml-agents-package.1073468/). + + +Your opinion matters a great deal to us. Only by hearing your thoughts on the +Unity ML-Agents Toolkit can we continue to improve and grow. Please take a few +minutes to +[let us know about it](https://unitysoftware.co1.qualtrics.com/jfe/form/SV_55pQKCZ578t0kbc). + +For any other questions or feedback, connect directly with the ML-Agents team at +ml-agents@unity3d.com. + +## Privacy + +In order to improve the developer experience for Unity ML-Agents Toolkit, we have added in-editor analytics. +Please refer to "Information that is passively collected by Unity" in the +[Unity Privacy Policy](https://unity3d.com/legal/privacy-policy). diff --git a/com.unity.ml-agents/Documentation~/Training-Configuration-File.md b/com.unity.ml-agents/Documentation~/Training-Configuration-File.md new file mode 100644 index 0000000000..1828acb28e --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Training-Configuration-File.md @@ -0,0 +1,232 @@ +# Training Configuration File + +**Table of Contents** + +- [Training Configuration File](#training-configuration-file) + - [Common Trainer Configurations](#common-trainer-configurations) + - [Trainer-specific Configurations](#trainer-specific-configurations) + - [PPO-specific Configurations](#ppo-specific-configurations) + - [SAC-specific Configurations](#sac-specific-configurations) + - [MA-POCA-specific Configurations](#ma-poca-specific-configurations) + - [Reward Signals](#reward-signals) + - [Extrinsic Rewards](#extrinsic-rewards) + - [Curiosity Intrinsic Reward](#curiosity-intrinsic-reward) + - [GAIL Intrinsic Reward](#gail-intrinsic-reward) + - [RND Intrinsic Reward](#rnd-intrinsic-reward) + - [Behavioral Cloning](#behavioral-cloning) + - [Memory-enhanced Agents using Recurrent Neural Networks](#memory-enhanced-agents-using-recurrent-neural-networks) + - [Self-Play](#self-play) + - [Note on Reward Signals](#note-on-reward-signals) + - [Note on Swap Steps](#note-on-swap-steps) + +## Common Trainer Configurations + +One of the first decisions you need to make regarding your training run is which +trainer to use: PPO, SAC, or POCA. There are some training configurations that are +common to both trainers (which we review now) and others that depend on the +choice of the trainer (which we review on subsequent sections). + +| **Setting** | **Description** | +| :----------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `trainer_type` | (default = `ppo`) The type of trainer to use: `ppo`, `sac`, or `poca`. | +| `summary_freq` | (default = `50000`) Number of experiences that needs to be collected before generating and displaying training statistics. This determines the granularity of the graphs in Tensorboard. | +| `time_horizon` | (default = `64`) How many steps of experience to collect per-agent before adding it to the experience buffer. When this limit is reached before the end of an episode, a value estimate is used to predict the overall expected reward from the agent's current state. As such, this parameter trades off between a less biased, but higher variance estimate (long time horizon) and more biased, but less varied estimate (short time horizon). In cases where there are frequent rewards within an episode, or episodes are prohibitively large, a smaller number can be more ideal. This number should be large enough to capture all the important behavior within a sequence of an agent's actions.

Typical range: `32` - `2048` | +| `max_steps` | (default = `500000`) Total number of steps (i.e., observation collected and action taken) that must be taken in the environment (or across all environments if using multiple in parallel) before ending the training process. If you have multiple agents with the same behavior name within your environment, all steps taken by those agents will contribute to the same `max_steps` count.

Typical range: `5e5` - `1e7` | +| `keep_checkpoints` | (default = `5`) The maximum number of model checkpoints to keep. Checkpoints are saved after the number of steps specified by the checkpoint_interval option. Once the maximum number of checkpoints has been reached, the oldest checkpoint is deleted when saving a new checkpoint. | +| `even_checkpoints` | (default = `false`) If set to true, ignores `checkpoint_interval` and evenly distributes checkpoints throughout training based on `keep_checkpoints`and `max_steps`, i.e. `checkpoint_interval = max_steps / keep_checkpoints`. Useful for cataloging agent behavior throughout training. | +| `checkpoint_interval` | (default = `500000`) The number of experiences collected between each checkpoint by the trainer. A maximum of `keep_checkpoints` checkpoints are saved before old ones are deleted. Each checkpoint saves the `.onnx` files in `results/` folder.| +| `init_path` | (default = None) Initialize trainer from a previously saved model. Note that the prior run should have used the same trainer configurations as the current run, and have been saved with the same version of ML-Agents.

You can provide either the file name or the full path to the checkpoint, e.g. `{checkpoint_name.pt}` or `./models/{run-id}/{behavior_name}/{checkpoint_name.pt}`. This option is provided in case you want to initialize different behaviors from different runs or initialize from an older checkpoint; in most cases, it is sufficient to use the `--initialize-from` CLI parameter to initialize all models from the same run. | +| `threaded` | (default = `false`) Allow environments to step while updating the model. This might result in a training speedup, especially when using SAC. For best performance, leave setting to `false` when using self-play. | +| `hyperparameters -> learning_rate` | (default = `3e-4`) Initial learning rate for gradient descent. Corresponds to the strength of each gradient descent update step. This should typically be decreased if training is unstable, and the reward does not consistently increase.

Typical range: `1e-5` - `1e-3` | +| `hyperparameters -> batch_size` | Number of experiences in each iteration of gradient descent. **This should always be multiple times smaller than `buffer_size`**. If you are using continuous actions, this value should be large (on the order of 1000s). If you are using only discrete actions, this value should be smaller (on the order of 10s).

Typical range: (Continuous - PPO): `512` - `5120`; (Continuous - SAC): `128` - `1024`; (Discrete, PPO & SAC): `32` - `512`. | +| `hyperparameters -> buffer_size` | (default = `10240` for PPO and `50000` for SAC)
**PPO:** Number of experiences to collect before updating the policy model. Corresponds to how many experiences should be collected before we do any learning or updating of the model. **This should be multiple times larger than `batch_size`**. Typically a larger `buffer_size` corresponds to more stable training updates.
**SAC:** The max size of the experience buffer - on the order of thousands of times longer than your episodes, so that SAC can learn from old as well as new experiences.

Typical range: PPO: `2048` - `409600`; SAC: `50000` - `1000000` | +| `hyperparameters -> learning_rate_schedule` | (default = `linear` for PPO and `constant` for SAC) Determines how learning rate changes over time. For PPO, we recommend decaying learning rate until max_steps so learning converges more stably. However, for some cases (e.g. training for an unknown amount of time) this feature can be disabled. For SAC, we recommend holding learning rate constant so that the agent can continue to learn until its Q function converges naturally.

`linear` decays the learning_rate linearly, reaching 0 at max_steps, while `constant` keeps the learning rate constant for the entire training run. | +| `network_settings -> hidden_units` | (default = `128`) Number of units in the hidden layers of the neural network. Correspond to how many units are in each fully connected layer of the neural network. For simple problems where the correct action is a straightforward combination of the observation inputs, this should be small. For problems where the action is a very complex interaction between the observation variables, this should be larger.

Typical range: `32` - `512` | +| `network_settings -> num_layers` | (default = `2`) The number of hidden layers in the neural network. Corresponds to how many hidden layers are present after the observation input, or after the CNN encoding of the visual observation. For simple problems, fewer layers are likely to train faster and more efficiently. More layers may be necessary for more complex control problems.

Typical range: `1` - `3` | +| `network_settings -> normalize` | (default = `false`) Whether normalization is applied to the vector observation inputs. This normalization is based on the running average and variance of the vector observation. Normalization can be helpful in cases with complex continuous control problems, but may be harmful with simpler discrete control problems. | +| `network_settings -> vis_encode_type` | (default = `simple`) Encoder type for encoding visual observations.

`simple` (default) uses a simple encoder which consists of two convolutional layers, `nature_cnn` uses the CNN implementation proposed by [Mnih et al.](https://www.nature.com/articles/nature14236), consisting of three convolutional layers, and `resnet` uses the [IMPALA Resnet](https://arxiv.org/abs/1802.01561) consisting of three stacked layers, each with two residual blocks, making a much larger network than the other two. `match3` is a smaller CNN ([Gudmundsoon et al.](https://www.researchgate.net/publication/328307928_Human-Like_Playtesting_with_Deep_Learning)) that can capture more granular spatial relationships and is optimized for board games. `fully_connected` uses a single fully connected dense layer as encoder without any convolutional layers.

Due to the size of convolution kernel, there is a minimum observation size limitation that each encoder type can handle - `simple`: 20x20, `nature_cnn`: 36x36, `resnet`: 15 x 15, `match3`: 5x5. `fully_connected` doesn't have convolutional layers and thus no size limits, but since it has less representation power it should be reserved for very small inputs. Note that using the `match3` CNN with very large visual input might result in a huge observation encoding and thus potentially slow down training or cause memory issues. | +| `network_settings -> goal_conditioning_type` | (default = `hyper`) Conditioning type for the policy using goal observations.

`none` treats the goal observations as regular observations, `hyper` (default) uses a HyperNetwork with goal observations as input to generate some of the weights of the policy. Note that when using `hyper` the number of parameters of the network increases greatly. Therefore, it is recommended to reduce the number of `hidden_units` when using this `goal_conditioning_type` + + +## Trainer-specific Configurations + +Depending on your choice of a trainer, there are additional trainer-specific +configurations. We present them below in two separate tables, but keep in mind +that you only need to include the configurations for the trainer selected (i.e. +the `trainer` setting above). + +### PPO-specific Configurations + +| **Setting** | **Description** | +| :---------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `hyperparameters -> beta` | (default = `5.0e-3`) Strength of the entropy regularization, which makes the policy "more random." This ensures that agents properly explore the action space during training. Increasing this will ensure more random actions are taken. This should be adjusted such that the entropy (measurable from TensorBoard) slowly decreases alongside increases in reward. If entropy drops too quickly, increase beta. If entropy drops too slowly, decrease `beta`.

Typical range: `1e-4` - `1e-2` | +| `hyperparameters -> epsilon` | (default = `0.2`) Influences how rapidly the policy can evolve during training. Corresponds to the acceptable threshold of divergence between the old and new policies during gradient descent updating. Setting this value small will result in more stable updates, but will also slow the training process.

Typical range: `0.1` - `0.3` | +| `hyperparameters -> beta_schedule` | (default = `learning_rate_schedule`) Determines how beta changes over time.

`linear` decays beta linearly, reaching 0 at max_steps, while `constant` keeps beta constant for the entire training run. If not explicitly set, the default beta schedule will be set to `hyperparameters -> learning_rate_schedule`. | +| `hyperparameters -> epsilon_schedule` | (default = `learning_rate_schedule `) Determines how epsilon changes over time (PPO only).

`linear` decays epsilon linearly, reaching 0 at max_steps, while `constant` keeps the epsilon constant for the entire training run. If not explicitly set, the default epsilon schedule will be set to `hyperparameters -> learning_rate_schedule`. +| `hyperparameters -> lambd` | (default = `0.95`) Regularization parameter (lambda) used when calculating the Generalized Advantage Estimate ([GAE](https://arxiv.org/abs/1506.02438)). This can be thought of as how much the agent relies on its current value estimate when calculating an updated value estimate. Low values correspond to relying more on the current value estimate (which can be high bias), and high values correspond to relying more on the actual rewards received in the environment (which can be high variance). The parameter provides a trade-off between the two, and the right value can lead to a more stable training process.

Typical range: `0.9` - `0.95` | +| `hyperparameters -> num_epoch` | (default = `3`) Number of passes to make through the experience buffer when performing gradient descent optimization.The larger the batch_size, the larger it is acceptable to make this. Decreasing this will ensure more stable updates, at the cost of slower learning.

Typical range: `3` - `10` | +| `hyperparameters -> shared_critic` | (default = `False`) Whether or not the policy and value function networks share a backbone. It may be useful to use a shared backbone when learning from image observations. + +### SAC-specific Configurations + +| **Setting** | **Description** | +| :------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| `hyperparameters -> buffer_init_steps` | (default = `0`) Number of experiences to collect into the buffer before updating the policy model. As the untrained policy is fairly random, pre-filling the buffer with random actions is useful for exploration. Typically, at least several episodes of experiences should be pre-filled.

Typical range: `1000` - `10000` | +| `hyperparameters -> init_entcoef` | (default = `1.0`) How much the agent should explore in the beginning of training. Corresponds to the initial entropy coefficient set at the beginning of training. In SAC, the agent is incentivized to make its actions entropic to facilitate better exploration. The entropy coefficient weighs the true reward with a bonus entropy reward. The entropy coefficient is [automatically adjusted](https://arxiv.org/abs/1812.05905) to a preset target entropy, so the `init_entcoef` only corresponds to the starting value of the entropy bonus. Increase init_entcoef to explore more in the beginning, decrease to converge to a solution faster.

Typical range: (Continuous): `0.5` - `1.0`; (Discrete): `0.05` - `0.5` | +| `hyperparameters -> save_replay_buffer` | (default = `false`) Whether to save and load the experience replay buffer as well as the model when quitting and re-starting training. This may help resumes go more smoothly, as the experiences collected won't be wiped. Note that replay buffers can be very large, and will take up a considerable amount of disk space. For that reason, we disable this feature by default. | +| `hyperparameters -> tau` | (default = `0.005`) How aggressively to update the target network used for bootstrapping value estimation in SAC. Corresponds to the magnitude of the target Q update during the SAC model update. In SAC, there are two neural networks: the target and the policy. The target network is used to bootstrap the policy's estimate of the future rewards at a given state, and is fixed while the policy is being updated. This target is then slowly updated according to tau. Typically, this value should be left at 0.005. For simple problems, increasing tau to 0.01 might reduce the time it takes to learn, at the cost of stability.

Typical range: `0.005` - `0.01` | +| `hyperparameters -> steps_per_update` | (default = `1`) Average ratio of agent steps (actions) taken to updates made of the agent's policy. In SAC, a single "update" corresponds to grabbing a batch of size `batch_size` from the experience replay buffer, and using this mini batch to update the models. Note that it is not guaranteed that after exactly `steps_per_update` steps an update will be made, only that the ratio will hold true over many steps. Typically, `steps_per_update` should be greater than or equal to 1. Note that setting `steps_per_update` lower will improve sample efficiency (reduce the number of steps required to train) but increase the CPU time spent performing updates. For most environments where steps are fairly fast (e.g. our example environments) `steps_per_update` equal to the number of agents in the scene is a good balance. For slow environments (steps take 0.1 seconds or more) reducing `steps_per_update` may improve training speed. We can also change `steps_per_update` to lower than 1 to update more often than once per step, though this will usually result in a slowdown unless the environment is very slow.

Typical range: `1` - `20` | +| `hyperparameters -> reward_signal_num_update` | (default = `steps_per_update`) Number of steps per mini batch sampled and used for updating the reward signals. By default, we update the reward signals once every time the main policy is updated. However, to imitate the training procedure in certain imitation learning papers (e.g. [Kostrikov et. al](http://arxiv.org/abs/1809.02925), [Blondé et. al](http://arxiv.org/abs/1809.02064)), we may want to update the reward signal (GAIL) M times for every update of the policy. We can change `steps_per_update` of SAC to N, as well as `reward_signal_steps_per_update` under `reward_signals` to N / M to accomplish this. By default, `reward_signal_steps_per_update` is set to `steps_per_update`. | + +### MA-POCA-specific Configurations +MA-POCA uses the same configurations as PPO, and there are no additional POCA-specific parameters. + +**NOTE**: Reward signals other than Extrinsic Rewards have not been extensively tested with MA-POCA, +though they can still be added and used for training on a your-mileage-may-vary basis. + +## Reward Signals + +The `reward_signals` section enables the specification of settings for both +extrinsic (i.e. environment-based) and intrinsic reward signals (e.g. curiosity +and GAIL). Each reward signal should define at least two parameters, `strength` +and `gamma`, in addition to any class-specific hyperparameters. Note that to +remove a reward signal, you should delete its entry entirely from +`reward_signals`. At least one reward signal should be left defined at all +times. Provide the following configurations to design the reward signal for your +training run. + +### Extrinsic Rewards + +Enable these settings to ensure that your training run incorporates your +environment-based reward signal: + +| **Setting** | **Description** | +| :---------------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `extrinsic -> strength` | (default = `1.0`) Factor by which to multiply the reward given by the environment. Typical ranges will vary depending on the reward signal.

Typical range: `1.00` | +| `extrinsic -> gamma` | (default = `0.99`) Discount factor for future rewards coming from the environment. This can be thought of as how far into the future the agent should care about possible rewards. In situations when the agent should be acting in the present in order to prepare for rewards in the distant future, this value should be large. In cases when rewards are more immediate, it can be smaller. Must be strictly smaller than 1.

Typical range: `0.8` - `0.995` | + +### Curiosity Intrinsic Reward + +To enable curiosity, provide these settings: + +| **Setting** | **Description** | +| :--------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| `curiosity -> strength` | (default = `1.0`) Magnitude of the curiosity reward generated by the intrinsic curiosity module. This should be scaled in order to ensure it is large enough to not be overwhelmed by extrinsic reward signals in the environment. Likewise it should not be too large to overwhelm the extrinsic reward signal.

Typical range: `0.001` - `0.1` | +| `curiosity -> gamma` | (default = `0.99`) Discount factor for future rewards.

Typical range: `0.8` - `0.995` | +| `curiosity -> network_settings` | Please see the documentation for `network_settings` under [Common Trainer Configurations](#common-trainer-configurations). The network specs used by the intrinsic curiosity model. The value should of `hidden_units` should be small enough to encourage the ICM to compress the original observation, but also not too small to prevent it from learning to differentiate between expected and actual observations.

Typical range: `64` - `256` | +| `curiosity -> learning_rate` | (default = `3e-4`) Learning rate used to update the intrinsic curiosity module. This should typically be decreased if training is unstable, and the curiosity loss is unstable.

Typical range: `1e-5` - `1e-3` | + +### GAIL Intrinsic Reward + +To enable GAIL (assuming you have recorded demonstrations), provide these +settings: + +| **Setting** | **Description** | +| :---------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| `gail -> strength` | (default = `1.0`) Factor by which to multiply the raw reward. Note that when using GAIL with an Extrinsic Signal, this value should be set lower if your demonstrations are suboptimal (e.g. from a human), so that a trained agent will focus on receiving extrinsic rewards instead of exactly copying the demonstrations. Keep the strength below about 0.1 in those cases.

Typical range: `0.01` - `1.0` | +| `gail -> gamma` | (default = `0.99`) Discount factor for future rewards.

Typical range: `0.8` - `0.9` | +| `gail -> demo_path` | (Required, no default) The path to your .demo file or directory of .demo files. | +| `gail -> network_settings` | Please see the documentation for `network_settings` under [Common Trainer Configurations](#common-trainer-configurations). The network specs for the GAIL discriminator. The value of `hidden_units` should be small enough to encourage the discriminator to compress the original observation, but also not too small to prevent it from learning to differentiate between demonstrated and actual behavior. Dramatically increasing this size will also negatively affect training times.

Typical range: `64` - `256` | +| `gail -> learning_rate` | (Optional, default = `3e-4`) Learning rate used to update the discriminator. This should typically be decreased if training is unstable, and the GAIL loss is unstable.

Typical range: `1e-5` - `1e-3` | +| `gail -> use_actions` | (default = `false`) Determines whether the discriminator should discriminate based on both observations and actions, or just observations. Set to True if you want the agent to mimic the actions from the demonstrations, and False if you'd rather have the agent visit the same states as in the demonstrations but with possibly different actions. Setting to False is more likely to be stable, especially with imperfect demonstrations, but may learn slower. | +| `gail -> use_vail` | (default = `false`) Enables a variational bottleneck within the GAIL discriminator. This forces the discriminator to learn a more general representation and reduces its tendency to be "too good" at discriminating, making learning more stable. However, it does increase training time. Enable this if you notice your imitation learning is unstable, or unable to learn the task at hand. | + +### RND Intrinsic Reward + +Random Network Distillation (RND) is only available for the PyTorch trainers. +To enable RND, provide these settings: + +| **Setting** | **Description** | +| :--------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| `rnd -> strength` | (default = `1.0`) Magnitude of the curiosity reward generated by the intrinsic rnd module. This should be scaled in order to ensure it is large enough to not be overwhelmed by extrinsic reward signals in the environment. Likewise it should not be too large to overwhelm the extrinsic reward signal.

Typical range: `0.001` - `0.01` | +| `rnd -> gamma` | (default = `0.99`) Discount factor for future rewards.

Typical range: `0.8` - `0.995` | +| `rnd -> network_settings` | Please see the documentation for `network_settings` under [Common Trainer Configurations](#common-trainer-configurations). The network specs for the RND model. | +| `curiosity -> learning_rate` | (default = `3e-4`) Learning rate used to update the RND module. This should be large enough for the RND module to quickly learn the state representation, but small enough to allow for stable learning.

Typical range: `1e-5` - `1e-3` + + +## Behavioral Cloning + +To enable Behavioral Cloning as a pre-training option (assuming you have +recorded demonstrations), provide the following configurations under the +`behavioral_cloning` section: + +| **Setting** | **Description** | +| :------------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `demo_path` | (Required, no default) The path to your .demo file or directory of .demo files. | +| `strength` | (default = `1.0`) Learning rate of the imitation relative to the learning rate of PPO, and roughly corresponds to how strongly we allow BC to influence the policy.

Typical range: `0.1` - `0.5` | +| `steps` | (default = `0`) During BC, it is often desirable to stop using demonstrations after the agent has "seen" rewards, and allow it to optimize past the available demonstrations and/or generalize outside of the provided demonstrations. steps corresponds to the training steps over which BC is active. The learning rate of BC will anneal over the steps. Set the steps to 0 for constant imitation over the entire training run. | +| `batch_size` | (default = `batch_size` of trainer) Number of demonstration experiences used for one iteration of a gradient descent update. If not specified, it will default to the `batch_size` of the trainer.

Typical range: (Continuous): `512` - `5120`; (Discrete): `32` - `512` | +| `num_epoch` | (default = `num_epoch` of trainer) Number of passes through the experience buffer during gradient descent. If not specified, it will default to the number of epochs set for PPO.

Typical range: `3` - `10` | +| `samples_per_update` | (default = `0`) Maximum number of samples to use during each imitation update. You may want to lower this if your demonstration dataset is very large to avoid overfitting the policy on demonstrations. Set to 0 to train over all of the demonstrations at each update step.

Typical range: `buffer_size` + +## Memory-enhanced Agents using Recurrent Neural Networks + +You can enable your agents to use memory by adding a `memory` section under `network_settings`, +and setting `memory_size` and `sequence_length`: + +| **Setting** | **Description** | +| :---------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `network_settings -> memory -> memory_size` | (default = `128`) Size of the memory an agent must keep. In order to use a LSTM, training requires a sequence of experiences instead of single experiences. Corresponds to the size of the array of floating point numbers used to store the hidden state of the recurrent neural network of the policy. This value must be a multiple of 2, and should scale with the amount of information you expect the agent will need to remember in order to successfully complete the task.

Typical range: `32` - `256` | +| `network_settings -> memory -> sequence_length` | (default = `64`) Defines how long the sequences of experiences must be while training. Note that if this number is too small, the agent will not be able to remember things over longer periods of time. If this number is too large, the neural network will take longer to train.

Typical range: `4` - `128` | + +A few considerations when deciding to use memory: + +- LSTM does not work well with continuous actions. Please use + discrete actions for better results. +- Adding a recurrent layer increases the complexity of the neural network, it is + recommended to decrease `num_layers` when using recurrent. +- It is required that `memory_size` be divisible by 2. + +## Self-Play + +Training with self-play adds additional confounding factors to the usual issues +faced by reinforcement learning. In general, the tradeoff is between the skill +level and generality of the final policy and the stability of learning. Training +against a set of slowly or unchanging adversaries with low diversity results in +a more stable learning process than training against a set of quickly changing +adversaries with high diversity. With this context, this guide discusses the +exposed self-play hyperparameters and intuitions for tuning them. + +If your environment contains multiple agents that are divided into teams, you +can leverage our self-play training option by providing these configurations for +each Behavior: + +| **Setting** | **Description** | +| :-------------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| `save_steps` | (default = `20000`) Number of _trainer steps_ between snapshots. For example, if `save_steps=10000` then a snapshot of the current policy will be saved every `10000` trainer steps. Note, trainer steps are counted per agent. For more information, please see the [migration doc](Migrating.md) after v0.13.

A larger value of `save_steps` will yield a set of opponents that cover a wider range of skill levels and possibly play styles since the policy receives more training. As a result, the agent trains against a wider variety of opponents. Learning a policy to defeat more diverse opponents is a harder problem and so may require more overall training steps but also may lead to more general and robust policy at the end of training. This value is also dependent on how intrinsically difficult the environment is for the agent.

Typical range: `10000` - `100000` | +| `team_change` | (default = `5 * save_steps`) Number of _trainer_steps_ between switching the learning team. This is the number of trainer steps the teams associated with a specific ghost trainer will train before a different team becomes the new learning team. It is possible that, in asymmetric games, opposing teams require fewer trainer steps to make similar performance gains. This enables users to train a more complicated team of agents for more trainer steps than a simpler team of agents per team switch.

A larger value of `team-change` will allow the agent to train longer against it's opponents. The longer an agent trains against the same set of opponents the more able it will be to defeat them. However, training against them for too long may result in overfitting to the particular opponent strategies and so the agent may fail against the next batch of opponents.

The value of `team-change` will determine how many snapshots of the agent's policy are saved to be used as opponents for the other team. So, we recommend setting this value as a function of the `save_steps` parameter discussed previously.

Typical range: 4x-10x where x=`save_steps` | +| `swap_steps` | (default = `10000`) Number of _ghost steps_ (not trainer steps) between swapping the opponents policy with a different snapshot. A 'ghost step' refers to a step taken by an agent _that is following a fixed policy and not learning_. The reason for this distinction is that in asymmetric games, we may have teams with an unequal number of agents e.g. a 2v1 scenario like our Strikers Vs Goalie example environment. The team with two agents collects twice as many agent steps per environment step as the team with one agent. Thus, these two values will need to be distinct to ensure that the same number of trainer steps corresponds to the same number of opponent swaps for each team. The formula for `swap_steps` if a user desires `x` swaps of a team with `num_agents` agents against an opponent team with `num_opponent_agents` agents during `team-change` total steps is: `(num_agents / num_opponent_agents) * (team_change / x)`

Typical range: `10000` - `100000` | +| `play_against_latest_model_ratio` | (default = `0.5`) Probability an agent will play against the latest opponent policy. With probability 1 - `play_against_latest_model_ratio`, the agent will play against a snapshot of its opponent from a past iteration.

A larger value of `play_against_latest_model_ratio` indicates that an agent will be playing against the current opponent more often. Since the agent is updating it's policy, the opponent will be different from iteration to iteration. This can lead to an unstable learning environment, but poses the agent with an [auto-curricula](https://openai.com/research/emergent-tool-use) of more increasingly challenging situations which may lead to a stronger final policy.

Typical range: `0.0` - `1.0` | +| `window` | (default = `10`) Size of the sliding window of past snapshots from which the agent's opponents are sampled. For example, a `window` size of 5 will save the last 5 snapshots taken. Each time a new snapshot is taken, the oldest is discarded. A larger value of `window` means that an agent's pool of opponents will contain a larger diversity of behaviors since it will contain policies from earlier in the training run. Like in the `save_steps` hyperparameter, the agent trains against a wider variety of opponents. Learning a policy to defeat more diverse opponents is a harder problem and so may require more overall training steps but also may lead to more general and robust policy at the end of training.

Typical range: `5` - `30` | + +### Note on Reward Signals + +We make the assumption that the final reward in a trajectory corresponds to the +outcome of an episode. A final reward of +1 indicates winning, -1 indicates +losing and 0 indicates a draw. The ELO calculation (discussed below) depends on +this final reward being either +1, 0, -1. + +The reward signal should still be used as described in the documentation for the +other trainers. However, we encourage users to be a bit more conservative when +shaping reward functions due to the instability and non-stationarity of learning +in adversarial games. Specifically, we encourage users to begin with the +simplest possible reward function (+1 winning, -1 losing) and to allow for more +iterations of training to compensate for the sparsity of reward. + +### Note on Swap Steps + +As an example, in a 2v1 scenario, if we want the swap to occur x=4 times during +team-change=200000 steps, the swap_steps for the team of one agent is: + +swap_steps = (1 / 2) \* (200000 / 4) = 25000 The swap_steps for the team of two +agents is: + +swap_steps = (2 / 1) \* (200000 / 4) = 100000 Note, with equal team sizes, the +first term is equal to 1 and swap_steps can be calculated by just dividing the +total steps by the desired number of swaps. + +A larger value of swap_steps means that an agent will play against the same +fixed opponent for a longer number of training iterations. This results in a +more stable training scenario, but leaves the agent open to the risk of +overfitting it's behavior for this particular opponent. Thus, when a new +opponent is swapped, the agent may lose more often than expected. diff --git a/com.unity.ml-agents/Documentation~/Training-ML-Agents.md b/com.unity.ml-agents/Documentation~/Training-ML-Agents.md new file mode 100644 index 0000000000..dbd85e30d8 --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Training-ML-Agents.md @@ -0,0 +1,624 @@ +# Training ML-Agents + +**Table of Contents** + +- [Training ML-Agents](#training-ml-agents) + - [Training with mlagents-learn](#training-with-mlagents-learn) + - [Starting Training](#starting-training) + - [Observing Training](#observing-training) + - [Stopping and Resuming Training](#stopping-and-resuming-training) + - [Loading an Existing Model](#loading-an-existing-model) + - [Training Configurations](#training-configurations) + - [Adding CLI Arguments to the Training Configuration file](#adding-cli-arguments-to-the-training-configuration-file) + - [Environment settings](#environment-settings) + - [Engine settings](#engine-settings) + - [Checkpoint settings](#checkpoint-settings) + - [Torch settings:](#torch-settings) + - [Behavior Configurations](#behavior-configurations) + - [Default Behavior Settings](#default-behavior-settings) + - [Environment Parameters](#environment-parameters) + - [Environment Parameter Randomization](#environment-parameter-randomization) + - [Supported Sampler Types](#supported-sampler-types) + - [Training with Environment Parameter Randomization](#training-with-environment-parameter-randomization) + - [Curriculum](#curriculum) + - [Training with a Curriculum](#training-with-a-curriculum) + - [Training Using Concurrent Unity Instances](#training-using-concurrent-unity-instances) + +For a broad overview of reinforcement learning, imitation learning and all the +training scenarios, methods and options within the ML-Agents Toolkit, see +[ML-Agents Toolkit Overview](ML-Agents-Overview.md). + +Once your learning environment has been created and is ready for training, the +next step is to initiate a training run. Training in the ML-Agents Toolkit is +powered by a dedicated Python package, `mlagents`. This package exposes a +command `mlagents-learn` that is the single entry point for all training +workflows (e.g. reinforcement leaning, imitation learning, curriculum learning). +Its implementation can be found at +[ml-agents/mlagents/trainers/learn.py](../ml-agents/mlagents/trainers/learn.py). + +## Training with mlagents-learn + +### Starting Training + +`mlagents-learn` is the main training utility provided by the ML-Agents Toolkit. +It accepts a number of CLI options in addition to a YAML configuration file that +contains all the configurations and hyperparameters to be used during training. +The set of configurations and hyperparameters to include in this file depend on +the agents in your environment and the specific training method you wish to +utilize. Keep in mind that the hyperparameter values can have a big impact on +the training performance (i.e. your agent's ability to learn a policy that +solves the task). In this page, we will review all the hyperparameters for all +training methods and provide guidelines and advice on their values. + +To view a description of all the CLI options accepted by `mlagents-learn`, use +the `--help`: + +```sh +mlagents-learn --help +``` + +The basic command for training is: + +```sh +mlagents-learn --env= --run-id= +``` + +where + +- `` is the file path of the trainer configuration YAML. + This contains all the hyperparameter values. We offer a detailed guide on the + structure of this file and the meaning of the hyperparameters (and advice on + how to set them) in the dedicated + [Training Configurations](#training-configurations) section below. +- ``**(Optional)** is the name (including path) of your + [Unity executable](Learning-Environment-Executable.md) containing the agents + to be trained. If `` is not passed, the training will happen in the + Editor. Press the **Play** button in Unity when the message _"Start training + by pressing the Play button in the Unity Editor"_ is displayed on the screen. +- `` is a unique name you can use to identify the results of + your training runs. + +See the +[Getting Started Guide](Getting-Started.md#training-a-new-model-with-reinforcement-learning) +for a sample execution of the `mlagents-learn` command. + +#### Observing Training + +Regardless of which training methods, configurations or hyperparameters you +provide, the training process will always generate three artifacts, all found +in the `results/` folder: + +1. Summaries: these are training metrics that + are updated throughout the training process. They are helpful to monitor your + training performance and may help inform how to update your hyperparameter + values. See [Using TensorBoard](Using-Tensorboard.md) for more details on how + to visualize the training metrics. +1. Models: these contain the model checkpoints that + are updated throughout training and the final model file (`.onnx`). This final + model file is generated once either when training completes or is + interrupted. +1. Timers file (under `results//run_logs`): this contains aggregated + metrics on your training process, including time spent on specific code + blocks. See [Profiling in Python](Profiling-Python.md) for more information + on the timers generated. + +These artifacts are updated throughout the training +process and finalized when training is completed or is interrupted. + +#### Stopping and Resuming Training + +To interrupt training and save the current progress, hit `Ctrl+C` once and wait +for the model(s) to be saved out. + +To resume a previously interrupted or completed training run, use the `--resume` +flag and make sure to specify the previously used run ID. + +If you would like to re-run a previously interrupted or completed training run +and re-use the same run ID (in this case, overwriting the previously generated +artifacts), then use the `--force` flag. + +#### Loading an Existing Model + +You can also use this mode to run inference of an already-trained model in +Python by using both the `--resume` and `--inference` flags. Note that if you +want to run inference in Unity, you should use the +[Inference Engine](Getting-Started.md#running-a-pre-trained-model). + +Additionally, if the network architecture changes, you may still load an existing model, +but ML-Agents will only load the parts of the model it can load and ignore all others. For instance, +if you add a new reward signal, the existing model will load but the new reward signal +will be initialized from scratch. If you have a model with a visual encoder (CNN) but +change the `hidden_units`, the CNN will be loaded but the body of the network will be +initialized from scratch. + +Alternatively, you might want to start a new training run but _initialize_ it +using an already-trained model. You may want to do this, for instance, if your +environment changed and you want a new model, but the old behavior is still +better than random. You can do this by specifying +`--initialize-from=`, where `` is the old run +ID. + +## Training Configurations + +The Unity ML-Agents Toolkit provides a wide range of training scenarios, methods +and options. As such, specific training runs may require different training +configurations and may generate different artifacts and TensorBoard statistics. +This section offers a detailed guide into how to manage the different training +set-ups withing the toolkit. + +More specifically, this section offers a detailed guide on the command-line +flags for `mlagents-learn` that control the training configurations: + +- ``: defines the training hyperparameters for each + Behavior in the scene, and the set-ups for the environment parameters + (Curriculum Learning and Environment Parameter Randomization) + +It is important to highlight that successfully training a Behavior in the +ML-Agents Toolkit involves tuning the training hyperparameters and +configuration. This guide contains some best practices for tuning the training +process when the default parameters don't seem to be giving the level of +performance you would like. We provide sample configuration files for our +example environments in the [config/](../config/) directory. The +`config/ppo/3DBall.yaml` was used to train the 3D Balance Ball in the +[Getting Started](Getting-Started.md) guide. That configuration file uses the +PPO trainer, but we also have configuration files for SAC and GAIL. + +Additionally, the set of configurations you provide depend on the training +functionalities you use (see [ML-Agents Toolkit Overview](ML-Agents-Overview.md) +for a description of all the training functionalities). Each functionality you +add typically has its own training configurations. For instance: + +- Use PPO or SAC? +- Use Recurrent Neural Networks for adding memory to your agents? +- Use the intrinsic curiosity module? +- Ignore the environment reward signal? +- Pre-train using behavioral cloning? (Assuming you have recorded + demonstrations.) +- Include the GAIL intrinsic reward signals? (Assuming you have recorded + demonstrations.) +- Use self-play? (Assuming your environment includes multiple agents.) + +The trainer config file, ``, determines the features you will +use during training, and the answers to the above questions will dictate its contents. +The rest of this guide breaks down the different sub-sections of the trainer config file +and explains the possible settings for each. If you need a list of all the trainer +configurations, please see [Training Configuration File](Training-Configuration-File.md). + +**NOTE:** The configuration file format has been changed between 0.17.0 and 0.18.0 and +between 0.18.0 and onwards. To convert +an old set of configuration files (trainer config, curriculum, and sampler files) to the new +format, a script has been provided. Run `python -m mlagents.trainers.upgrade_config -h` in your +console to see the script's usage. + +### Adding CLI Arguments to the Training Configuration file + +Additionally, within the training configuration YAML file, you can also add the +CLI arguments (such as `--num-envs`). + +Reminder that a detailed description of all the CLI arguments can be found by +using the help utility: + +```sh +mlagents-learn --help +``` + +These additional CLI arguments are grouped into environment, engine, checkpoint and torch. +The available settings and example values are shown below. + +#### Environment settings + +```yaml +env_settings: + env_path: FoodCollector + env_args: null + base_port: 5005 + num_envs: 1 + timeout_wait: 10 + seed: -1 + max_lifetime_restarts: 10 + restarts_rate_limit_n: 1 + restarts_rate_limit_period_s: 60 +``` + +#### Engine settings + +```yaml +engine_settings: + width: 84 + height: 84 + quality_level: 5 + time_scale: 20 + target_frame_rate: -1 + capture_frame_rate: 60 + no_graphics: false +``` + +#### Checkpoint settings + +```yaml +checkpoint_settings: + run_id: foodtorch + initialize_from: null + load_model: false + resume: false + force: true + train_model: false + inference: false +``` + +#### Torch settings: + +```yaml +torch_settings: + device: cpu +``` + +### Behavior Configurations + +The primary section of the trainer config file is a +set of configurations for each Behavior in your scene. These are defined under +the sub-section `behaviors` in your trainer config file. Some of the +configurations are required while others are optional. To help us get started, +below is a sample file that includes all the possible settings if we're using a +PPO trainer with all the possible training functionalities enabled (memory, +behavioral cloning, curiosity, GAIL and self-play). You will notice that +curriculum and environment parameter randomization settings are not part of the `behaviors` +configuration, but in their own section called `environment_parameters`. + +```yaml +behaviors: + BehaviorPPO: + trainer_type: ppo + + hyperparameters: + # Hyperparameters common to PPO and SAC + batch_size: 1024 + buffer_size: 10240 + learning_rate: 3.0e-4 + learning_rate_schedule: linear + + # PPO-specific hyperparameters + beta: 5.0e-3 + beta_schedule: constant + epsilon: 0.2 + epsilon_schedule: linear + lambd: 0.95 + num_epoch: 3 + shared_critic: False + + # Configuration of the neural network (common to PPO/SAC) + network_settings: + vis_encode_type: simple + normalize: false + hidden_units: 128 + num_layers: 2 + # memory + memory: + sequence_length: 64 + memory_size: 256 + + # Trainer configurations common to all trainers + max_steps: 5.0e5 + time_horizon: 64 + summary_freq: 10000 + keep_checkpoints: 5 + checkpoint_interval: 50000 + threaded: false + init_path: null + + # behavior cloning + behavioral_cloning: + demo_path: Project/Assets/ML-Agents/Examples/Pyramids/Demos/ExpertPyramid.demo + strength: 0.5 + steps: 150000 + batch_size: 512 + num_epoch: 3 + samples_per_update: 0 + + reward_signals: + # environment reward (default) + extrinsic: + strength: 1.0 + gamma: 0.99 + + # curiosity module + curiosity: + strength: 0.02 + gamma: 0.99 + encoding_size: 256 + learning_rate: 3.0e-4 + + # GAIL + gail: + strength: 0.01 + gamma: 0.99 + encoding_size: 128 + demo_path: Project/Assets/ML-Agents/Examples/Pyramids/Demos/ExpertPyramid.demo + learning_rate: 3.0e-4 + use_actions: false + use_vail: false + + # self-play + self_play: + window: 10 + play_against_latest_model_ratio: 0.5 + save_steps: 50000 + swap_steps: 2000 + team_change: 100000 +``` + +Here is an equivalent file if we use an SAC trainer instead. Notice that the +configurations for the additional functionalities (memory, behavioral cloning, +curiosity and self-play) remain unchanged. + +```yaml +behaviors: + BehaviorSAC: + trainer_type: sac + + # Trainer configs common to PPO/SAC (excluding reward signals) + # same as PPO config + + # SAC-specific configs (replaces the hyperparameters section above) + hyperparameters: + # Hyperparameters common to PPO and SAC + # Same as PPO config + + # SAC-specific hyperparameters + # Replaces the "PPO-specific hyperparameters" section above + buffer_init_steps: 0 + tau: 0.005 + steps_per_update: 10.0 + save_replay_buffer: false + init_entcoef: 0.5 + reward_signal_steps_per_update: 10.0 + + # Configuration of the neural network (common to PPO/SAC) + network_settings: + # Same as PPO config + + # Trainer configurations common to all trainers + # + + # pre-training using behavior cloning + behavioral_cloning: + # same as PPO config + + reward_signals: + # environment reward + extrinsic: + # same as PPO config + + # curiosity module + curiosity: + # same as PPO config + + # GAIL + gail: + # same as PPO config + + # self-play + self_play: + # same as PPO config +``` + +We now break apart the components of the configuration file and describe what +each of these parameters mean and provide guidelines on how to set them. See +[Training Configuration File](Training-Configuration-File.md) for a detailed +description of all the configurations listed above, along with their defaults. +Unless otherwise specified, omitting a configuration will revert it to its default. + +### Default Behavior Settings + +In some cases, you may want to specify a set of default configurations for your Behaviors. +This may be useful, for instance, if your Behavior names are generated procedurally by +the environment and not known before runtime, or if you have many Behaviors with very similar +settings. To specify a default configuration, insert a `default_settings` section in your YAML. +This section should be formatted exactly like a configuration for a Behavior. + +```yaml +default_settings: + # < Same as Behavior configuration > +behaviors: + # < Same as above > +``` + +Behaviors found in the environment that aren't specified in the YAML will now use the `default_settings`, +and unspecified settings in behavior configurations will default to the values in `default_settings` if +specified there. + +### Environment Parameters + +In order to control the `EnvironmentParameters` in the Unity simulation during training, +you need to add a section called `environment_parameters`. For example you can set the +value of an `EnvironmentParameter` called `my_environment_parameter` to `3.0` with +the following code : + +```yml +behaviors: + BehaviorY: + # < Same as above > + +# Add this section +environment_parameters: + my_environment_parameter: 3.0 +``` + +Inside the Unity simulation, you can access your Environment Parameters by doing : + +```csharp +Academy.Instance.EnvironmentParameters.GetWithDefault("my_environment_parameter", 0.0f); +``` + +#### Environment Parameter Randomization + +To enable environment parameter randomization, you need to edit the `environment_parameters` +section of your training configuration yaml file. Instead of providing a single float value +for your environment parameter, you can specify a sampler instead. Here is an example with +three environment parameters called `mass`, `length` and `scale`: + +```yml +behaviors: + BehaviorY: + # < Same as above > + +# Add this section +environment_parameters: + mass: + sampler_type: uniform + sampler_parameters: + min_value: 0.5 + max_value: 10 + + length: + sampler_type: multirangeuniform + sampler_parameters: + intervals: [[7, 10], [15, 20]] + + scale: + sampler_type: gaussian + sampler_parameters: + mean: 2 + st_dev: .3 +``` + + +| **Setting** | **Description** | +| :--------------------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `sampler_type` | A string identifier for the type of sampler to use for this `Environment Parameter`. | +| `sampler_parameters` | The parameters for a given `sampler_type`. Samplers of different types can have different `sampler_parameters` | + +##### Supported Sampler Types + +Below is a list of the `sampler_type` values supported by the toolkit. + +- `uniform` - Uniform sampler + - Uniformly samples a single float value from a range with a given minimum + and maximum value (inclusive). + - **parameters** - `min_value`, `max_value` +- `gaussian` - Gaussian sampler + - Samples a single float value from a normal distribution with a given mean + and standard deviation. + - **parameters** - `mean`, `st_dev` +- `multirange_uniform` - Multirange uniform sampler + - First, samples an interval from a set of intervals in proportion to relative + length of the intervals. Then, uniformly samples a single float value from the + sampled interval (inclusive). This sampler can take an arbitrary number of + intervals in a list in the following format: + [[`interval_1_min`, `interval_1_max`], [`interval_2_min`, + `interval_2_max`], ...] + - **parameters** - `intervals` + +The implementation of the samplers can be found in the +[Samplers.cs file](https://github.com/Unity-Technologies/ml-agents/blob/main/com.unity.ml-agents/Runtime/Sampler.cs). + +##### Training with Environment Parameter Randomization + +After the sampler configuration is defined, we proceed by launching `mlagents-learn` +and specify trainer configuration with parameter randomization enabled. For example, +if we wanted to train the 3D ball agent with parameter randomization, we would run + +```sh +mlagents-learn config/ppo/3DBall_randomize.yaml --run-id=3D-Ball-randomize +``` + +We can observe progress and metrics via TensorBoard. + +#### Curriculum + +To enable curriculum learning, you need to add a `curriculum` sub-section to your environment +parameter. Here is one example with the environment parameter `my_environment_parameter` : + +```yml +behaviors: + BehaviorY: + # < Same as above > + +# Add this section +environment_parameters: + my_environment_parameter: + curriculum: + - name: MyFirstLesson # The '-' is important as this is a list + completion_criteria: + measure: progress + behavior: my_behavior + signal_smoothing: true + min_lesson_length: 100 + threshold: 0.2 + value: 0.0 + - name: MySecondLesson # This is the start of the second lesson + completion_criteria: + measure: progress + behavior: my_behavior + signal_smoothing: true + min_lesson_length: 100 + threshold: 0.6 + require_reset: true + value: + sampler_type: uniform + sampler_parameters: + min_value: 4.0 + max_value: 7.0 + - name: MyLastLesson + value: 8.0 +``` + +Note that this curriculum __only__ applies to `my_environment_parameter`. The `curriculum` section +contains a list of `Lessons`. In the example, the lessons are named `MyFirstLesson`, `MySecondLesson` +and `MyLastLesson`. +Each `Lesson` has 3 fields : + + - `name` which is a user defined name for the lesson (The name of the lesson will be displayed in + the console when the lesson changes) + - `completion_criteria` which determines what needs to happen in the simulation before the lesson + can be considered complete. When that condition is met, the curriculum moves on to the next + `Lesson`. Note that you do not need to specify a `completion_criteria` for the last `Lesson` + - `value` which is the value the environment parameter will take during the lesson. Note that this + can be a float or a sampler. + + There are the different settings of the `completion_criteria` : + + +| **Setting** | **Description** | +| :------------------ | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `measure` | What to measure learning progress, and advancement in lessons by.

`reward` uses a measure of received reward, `progress` uses the ratio of steps/max_steps, while `Elo` is available only for self-play situations and uses Elo score as a curriculum completion measure. | +| `behavior` | Specifies which behavior is being tracked. There can be multiple behaviors with different names, each at different points of training. This setting allows the curriculum to track only one of them. | +| `threshold` | Determines at what point in value of `measure` the lesson should be increased. | +| `min_lesson_length` | The minimum number of episodes that should be completed before the lesson can change. If `measure` is set to `reward`, the average cumulative reward of the last `min_lesson_length` episodes will be used to determine if the lesson should change. Must be nonnegative.

**Important**: the average reward that is compared to the thresholds is different than the mean reward that is logged to the console. For example, if `min_lesson_length` is `100`, the lesson will increment after the average cumulative reward of the last `100` episodes exceeds the current threshold. The mean reward logged to the console is dictated by the `summary_freq` parameter defined above. | +| `signal_smoothing` | Whether to weight the current progress measure by previous values. | +| `require_reset` | Whether changing lesson requires the environment to reset (default: false) | +##### Training with a Curriculum + +Once we have specified our metacurriculum and curricula, we can launch +`mlagents-learn` to point to the config file containing +our curricula and PPO will train using Curriculum Learning. For example, to +train agents in the Wall Jump environment with curriculum learning, we can run: + +```sh +mlagents-learn config/ppo/WallJump_curriculum.yaml --run-id=wall-jump-curriculum +``` + +We can then keep track of the current lessons and progresses via TensorBoard. If you've terminated +the run, you can resume it using `--resume` and lesson progress will start off where it +ended. + + +### Training Using Concurrent Unity Instances + +In order to run concurrent Unity instances during training, set the number of +environment instances using the command line option `--num-envs=` when you +invoke `mlagents-learn`. Optionally, you can also set the `--base-port`, which +is the starting port used for the concurrent Unity instances. + +Some considerations: + +- **Buffer Size** - If you are having trouble getting an agent to train, even + with multiple concurrent Unity instances, you could increase `buffer_size` in + the trainer config file. A common practice is to multiply + `buffer_size` by `num-envs`. +- **Resource Constraints** - Invoking concurrent Unity instances is constrained + by the resources on the machine. Please use discretion when setting + `--num-envs=`. +- **Result Variation Using Concurrent Unity Instances** - If you keep all the + hyperparameters the same, but change `--num-envs=`, the results and model + would likely change. diff --git a/com.unity.ml-agents/Documentation~/Training-Plugins.md b/com.unity.ml-agents/Documentation~/Training-Plugins.md new file mode 100644 index 0000000000..24ba45d6a8 --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Training-Plugins.md @@ -0,0 +1,59 @@ +# Customizing Training via Plugins + +ML-Agents provides support for running your own python implementations of specific interfaces during the training +process. These interfaces are currently fairly limited, but will be expanded in the future. + +**Note:** Plugin interfaces should currently be considered "in beta", and they may change in future releases. + +## How to Write Your Own Plugin +[This video](https://www.youtube.com/watch?v=fY3Y_xPKWNA) explains the basics of how to create a plugin system using +setuptools, and is the same approach that ML-Agents' plugin system is based on. + +The `ml-agents-plugin-examples` directory contains a reference implementation of each plugin interface, so it's a good +starting point. + +### setup.py +If you don't already have a `setup.py` file for your python code, you'll need to add one. `ml-agents-plugin-examples` +has a [minimal example](../ml-agents-plugin-examples/setup.py) of this. + +In the call to `setup()`, you'll need to add to the `entry_points` dictionary for each plugin interface that you +implement. The form of this is `{entry point name}={plugin module}:{plugin function}`. For example, in + `ml-agents-plugin-examples`: +```python +entry_points={ + ML_AGENTS_STATS_WRITER: [ + "example=mlagents_plugin_examples.example_stats_writer:get_example_stats_writer" + ] +} +``` +* `ML_AGENTS_STATS_WRITER` (which is a string constant, `mlagents.stats_writer`) is the name of the plugin interface. +This must be one of the provided interfaces ([see below](#plugin-interfaces)). +* `example` is the plugin implementation name. This can be anything. +* `mlagents_plugin_examples.example_stats_writer` is the plugin module. This points to the module where the +plugin registration function is defined. +* `get_example_stats_writer` is the plugin registration function. This is called when running `mlagents-learn`. The +arguments and expected return type for this are different for each plugin interface. + +### Local Installation +Once you've defined `entry_points` in your `setup.py`, you will need to run +``` +pip install -e [path to your plugin code] +``` +in the same python virtual environment that you have `mlagents` installed. + +## Plugin Interfaces + +### StatsWriter +The StatsWriter class receives various information from the training process, such as the average Agent reward in +each summary period. By default, we log this information to the console and write it to +[TensorBoard](Using-Tensorboard.md). + +#### Interface +The `StatsWriter.write_stats()` method must be implemented in any derived classes. It takes a "category" parameter, +which typically is the behavior name of the Agents being trained, and a dictionary of `StatSummary` values with +string keys. Additionally, `StatsWriter.on_add_stat()` may be extended to register a callback handler for each stat +emission. + +#### Registration +The `StatsWriter` registration function takes a `RunOptions` argument and returns a list of `StatsWriter`s. An +example implementation is provided in [`mlagents_plugin_examples`](../ml-agents-plugin-examples/mlagents_plugin_examples/example_stats_writer.py) diff --git a/com.unity.ml-agents/Documentation~/Training-on-Amazon-Web-Service.md b/com.unity.ml-agents/Documentation~/Training-on-Amazon-Web-Service.md new file mode 100644 index 0000000000..d6549044d2 --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Training-on-Amazon-Web-Service.md @@ -0,0 +1,328 @@ +# Training on Amazon Web Service + +:warning: **Note:** We no longer use this guide ourselves and so it may not work +correctly. We've decided to keep it up just in case it is helpful to you. + +This page contains instructions for setting up an EC2 instance on Amazon Web +Service for training ML-Agents environments. + +## Pre-configured AMI + +We've prepared a pre-configured AMI for you with the ID: `ami-016ff5559334f8619` +in the `us-east-1` region. It was created as a modification of +[Deep Learning AMI (Ubuntu)](https://aws.amazon.com/marketplace/pp/B077GCH38C). +The AMI has been tested with p2.xlarge instance. Furthermore, if you want to +train without headless mode, you need to enable X Server. + +After launching your EC2 instance using the ami and ssh into it, run the +following commands to enable it: + +```sh +# Start the X Server, press Enter to come to the command line +$ sudo /usr/bin/X :0 & + +# Check if Xorg process is running +# You will have a list of processes running on the GPU, Xorg should be in the +# list, as shown below +$ nvidia-smi + +# Thu Jun 14 20:27:26 2018 +# +-----------------------------------------------------------------------------+ +# | NVIDIA-SMI 390.67 Driver Version: 390.67 | +# |-------------------------------+----------------------+----------------------+ +# | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | +# | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | +# |===============================+======================+======================| +# | 0 Tesla K80 On | 00000000:00:1E.0 Off | 0 | +# | N/A 35C P8 31W / 149W | 9MiB / 11441MiB | 0% Default | +# +-------------------------------+----------------------+----------------------+ +# +# +-----------------------------------------------------------------------------+ +# | Processes: GPU Memory | +# | GPU PID Type Process name Usage | +# |=============================================================================| +# | 0 2331 G /usr/lib/xorg/Xorg 8MiB | +# +-----------------------------------------------------------------------------+ + +# Make the ubuntu use X Server for display +$ export DISPLAY=:0 +``` + +## Configuring your own instance + +You could also choose to configure your own instance. To begin with, you will +need an EC2 instance which contains the latest Nvidia drivers, CUDA9, and cuDNN. +In this tutorial we used the +[Deep Learning AMI (Ubuntu)](https://aws.amazon.com/marketplace/pp/B077GCH38C) +listed under AWS Marketplace with a p2.xlarge instance. + +### Installing the ML-Agents Toolkit on the instance + +After launching your EC2 instance using the ami and ssh into it: + +1. Activate the python3 environment + + ```sh + source activate python3 + ``` + +2. Clone the ML-Agents repo and install the required Python packages + + ```sh + git clone --branch release_22 https://github.com/Unity-Technologies/ml-agents.git + cd ml-agents/ml-agents/ + pip3 install -e . + ``` + +### Setting up X Server (optional) + +X Server setup is only necessary if you want to do training that requires visual +observation input. _Instructions here are adapted from this +[Medium post](https://medium.com/towards-data-science/how-to-run-unity-on-amazon-cloud-or-without-monitor-3c10ce022639) +on running general Unity applications in the cloud._ + +Current limitations of the Unity Engine require that a screen be available to +render to when using visual observations. In order to make this possible when +training on a remote server, a virtual screen is required. We can do this by +installing Xorg and creating a virtual screen. Once installed and created, we +can display the Unity environment in the virtual environment, and train as we +would on a local machine. Ensure that `headless` mode is disabled when building +linux executables which use visual observations. + +#### Install and setup Xorg: + + ```sh + # Install Xorg + $ sudo apt-get update + $ sudo apt-get install -y xserver-xorg mesa-utils + $ sudo nvidia-xconfig -a --use-display-device=None --virtual=1280x1024 + + # Get the BusID information + $ nvidia-xconfig --query-gpu-info + + # Add the BusID information to your /etc/X11/xorg.conf file + $ sudo sed -i 's/ BoardName "Tesla K80"/ BoardName "Tesla K80"\n BusID "0:30:0"/g' /etc/X11/xorg.conf + + # Remove the Section "Files" from the /etc/X11/xorg.conf file + # And remove two lines that contain Section "Files" and EndSection + $ sudo vim /etc/X11/xorg.conf + ``` + +#### Update and setup Nvidia driver: + + ```sh + # Download and install the latest Nvidia driver for ubuntu + # Please refer to http://download.nvidia.com/XFree86/Linux-#x86_64/latest.txt + $ wget http://download.nvidia.com/XFree86/Linux-x86_64/390.87/NVIDIA-Linux-x86_64-390.87.run + $ sudo /bin/bash ./NVIDIA-Linux-x86_64-390.87.run --accept-license --no-questions --ui=none + + # Disable Nouveau as it will clash with the Nvidia driver + $ sudo echo 'blacklist nouveau' | sudo tee -a /etc/modprobe.d/blacklist.conf + $ sudo echo 'options nouveau modeset=0' | sudo tee -a /etc/modprobe.d/blacklist.conf + $ sudo echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf + $ sudo update-initramfs -u + ``` + +#### Restart the EC2 instance: + + ```sh + sudo reboot now + ``` + +#### Make sure there are no Xorg processes running: + +```sh +# Kill any possible running Xorg processes +# Note that you might have to run this command multiple times depending on +# how Xorg is configured. +$ sudo killall Xorg + +# Check if there is any Xorg process left +# You will have a list of processes running on the GPU, Xorg should not be in +# the list, as shown below. +$ nvidia-smi + +# Thu Jun 14 20:21:11 2018 +# +-----------------------------------------------------------------------------+ +# | NVIDIA-SMI 390.67 Driver Version: 390.67 | +# |-------------------------------+----------------------+----------------------+ +# | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | +# | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | +# |===============================+======================+======================| +# | 0 Tesla K80 On | 00000000:00:1E.0 Off | 0 | +# | N/A 37C P8 31W / 149W | 0MiB / 11441MiB | 0% Default | +# +-------------------------------+----------------------+----------------------+ +# +# +-----------------------------------------------------------------------------+ +# | Processes: GPU Memory | +# | GPU PID Type Process name Usage | +# |=============================================================================| +# | No running processes found | +# +-----------------------------------------------------------------------------+ + +``` + +#### Start X Server and make the ubuntu use X Server for display: + + ```console + # Start the X Server, press Enter to come back to the command line + $ sudo /usr/bin/X :0 & + + # Check if Xorg process is running + # You will have a list of processes running on the GPU, Xorg should be in the list. + $ nvidia-smi + + # Make the ubuntu use X Server for display + $ export DISPLAY=:0 + ``` + +#### Ensure the Xorg is correctly configured: + + ```sh + # For more information on glxgears, see ftp://www.x.org/pub/X11R6.8.1/doc/glxgears.1.html. + $ glxgears + # If Xorg is configured correctly, you should see the following message + + # Running synchronized to the vertical refresh. The framerate should be + # approximately the same as the monitor refresh rate. + # 137296 frames in 5.0 seconds = 27459.053 FPS + # 141674 frames in 5.0 seconds = 28334.779 FPS + # 141490 frames in 5.0 seconds = 28297.875 FPS + + ``` + +## Training on EC2 instance + +1. In the Unity Editor, load a project containing an ML-Agents environment (you + can use one of the example environments if you have not created your own). +2. Open the Build Settings window (menu: File > Build Settings). +3. Select Linux as the Target Platform, and x86_64 as the target architecture + (the default x86 currently does not work). +4. Check Headless Mode if you have not setup the X Server. (If you do not use + Headless Mode, you have to setup the X Server to enable training.) +5. Click Build to build the Unity environment executable. +6. Upload the executable to your EC2 instance within `ml-agents` folder. +7. Change the permissions of the executable. + + ```sh + chmod +x .x86_64 + ``` + +8. (Without Headless Mode) Start X Server and use it for display: + + ```sh + # Start the X Server, press Enter to come back to the command line + $ sudo /usr/bin/X :0 & + + # Check if Xorg process is running + # You will have a list of processes running on the GPU, Xorg should be in the list. + $ nvidia-smi + + # Make the ubuntu use X Server for display + $ export DISPLAY=:0 + ``` + +9. Test the instance setup from Python using: + + ```python + from mlagents_envs.environment import UnityEnvironment + + env = UnityEnvironment() + ``` + + Where `` corresponds to the path to your environment executable. + + You should receive a message confirming that the environment was loaded + successfully. + +10. Train your models + + ```console + mlagents-learn --env= --train + ``` + +## FAQ + +### The \_Data folder hasn't been copied cover + +If you've built your Linux executable, but forget to copy over the corresponding +\_Data folder, you will see error message like the following: + +```sh +Set current directory to /home/ubuntu/ml-agents/ml-agents +Found path: /home/ubuntu/ml-agents/ml-agents/3dball_linux.x86_64 +no boot config - using default values + +(Filename: Line: 403) + +There is no data folder +``` + +### Unity Environment not responding + +If you didn't setup X Server or hasn't launched it properly, or your environment +somehow crashes, or you haven't `chmod +x` your Unity Environment, all of these +will cause connection between Unity and Python to fail. Then you will see +something like this: + +```console +Logging to /home/ubuntu/.config/unity3d//Player.log +Traceback (most recent call last): + File "", line 1, in + File "/home/ubuntu/ml-agents/ml-agents/mlagents_envs/environment.py", line 63, in __init__ + aca_params = self.send_academy_parameters(rl_init_parameters_in) + File "/home/ubuntu/ml-agents/ml-agents/mlagents_envs/environment.py", line 489, in send_academy_parameters + return self.communicator.initialize(inputs).rl_initialization_output + File "/home/ubuntu/ml-agents/ml-agents/mlagents_envs/rpc_communicator.py", line 60, in initialize +mlagents_envs.exception.UnityTimeOutException: The Unity environment took too long to respond. Make sure that : + The environment does not need user interaction to launch + The environment and the Python interface have compatible versions. +``` + +It would be also really helpful to check your +/home/ubuntu/.config/unity3d//Player.log to see what happens with +your Unity environment. + +### Could not launch X Server + +When you execute: + +```sh +sudo /usr/bin/X :0 & +``` + +You might see something like: + +```sh +X.Org X Server 1.18.4 +... +(==) Log file: "/var/log/Xorg.0.log", Time: Thu Oct 11 21:10:38 2018 +(==) Using config file: "/etc/X11/xorg.conf" +(==) Using system config directory "/usr/share/X11/xorg.conf.d" +(EE) +Fatal server error: +(EE) no screens found(EE) +(EE) +Please consult the X.Org Foundation support + at http://wiki.x.org + for help. +(EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information. +(EE) +(EE) Server terminated with error (1). Closing log file. +``` + +And when you execute: + +```sh +nvidia-smi +``` + +You might see something like: + +```sh +NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running. +``` + +This means the NVIDIA's driver needs to be updated. Refer to +[this section](Training-on-Amazon-Web-Service.md#update-and-setup-nvidia-driver) +for more information. diff --git a/com.unity.ml-agents/Documentation~/Training-on-Microsoft-Azure.md b/com.unity.ml-agents/Documentation~/Training-on-Microsoft-Azure.md new file mode 100644 index 0000000000..759cc145c9 --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Training-on-Microsoft-Azure.md @@ -0,0 +1,210 @@ +# Training on Microsoft Azure (works with ML-Agents Toolkit v0.3) + +:warning: **Note:** We no longer use this guide ourselves and so it may not work +correctly. We've decided to keep it up just in case it is helpful to you. + +This page contains instructions for setting up training on Microsoft Azure +through either +[Azure Container Instances](https://azure.microsoft.com/en-us/products/container-instances/) +or Virtual Machines. Non "headless" training has not yet been tested to verify +support. + +## Pre-Configured Azure Virtual Machine + +A pre-configured virtual machine image is available in the Azure Marketplace and +is nearly completely ready for training. You can start by deploying the +[Data Science Virtual Machine for Linux (Ubuntu)](https://learn.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/dsvm-ubuntu-intro?view=azureml-api-2) +into your Azure subscription. + +Note that, if you choose to deploy the image to an +[N-Series GPU optimized VM](https://docs.microsoft.com/azure/virtual-machines/linux/sizes-gpu), +training will, by default, run on the GPU. If you choose any other type of VM, +training will run on the CPU. + +## Configuring your own Instance + +Setting up your own instance requires a number of package installations. Please +view the documentation for doing so [here](#custom-instances). + +## Installing ML-Agents + +1. [Move](https://docs.microsoft.com/en-us/azure/virtual-machines/linux/copy-files-to-linux-vm-using-scp) + the `ml-agents` sub-folder of this ml-agents repo to the remote Azure + instance, and set it as the working directory. +2. Install the required packages: + Torch: `pip3 install torch==1.7.0 -f https://download.pytorch.org/whl/torch_stable.html` and + MLAgents: `python -m pip install mlagents==1.1.0` + +## Testing + +To verify that all steps worked correctly: + +1. In the Unity Editor, load a project containing an ML-Agents environment (you + can use one of the example environments if you have not created your own). +2. Open the Build Settings window (menu: File > Build Settings). +3. Select Linux as the Target Platform, and x86_64 as the target architecture. +4. Check Headless Mode. +5. Click Build to build the Unity environment executable. +6. Upload the resulting files to your Azure instance. +7. Test the instance setup from Python using: + +```python +from mlagents_envs.environment import UnityEnvironment + +env = UnityEnvironment(file_name="", seed=1, side_channels=[]) +``` + +Where `` corresponds to the path to your environment executable (i.e. `/home/UserName/Build/yourFile`). + +You should receive a message confirming that the environment was loaded +successfully. + +**Note:** When running your environment in headless mode, you must append `--no-graphics` to your mlagents-learn command, as it won't train otherwise. +You can test this simply by aborting a training and check if it says "Model Saved" or "Aborted", or see if it generated the .onnx in the result folder. + +## Running Training on your Virtual Machine + +To run your training on the VM: + +1. [Move](https://docs.microsoft.com/en-us/azure/virtual-machines/linux/copy-files-to-linux-vm-using-scp) + your built Unity application to your Virtual Machine. +2. Set the directory where the ML-Agents Toolkit was installed to your working + directory. +3. Run the following command: + +```sh +mlagents-learn --env= --run-id= --train +``` + +Where `` is the path to your app (i.e. +`~/unity-volume/3DBallHeadless`) and `` is an identifier you would like +to identify your training run with. + +If you've selected to run on a N-Series VM with GPU support, you can verify that +the GPU is being used by running `nvidia-smi` from the command line. + +## Monitoring your Training Run with TensorBoard + +Once you have started training, you can +[use TensorBoard to observe the training](Using-Tensorboard.md). + +1. Start by + [opening the appropriate port for web traffic to connect to your VM](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/nsg-quickstart-portal). + + - Note that you don't need to generate a new `Network Security Group` but + instead, go to the **Networking** tab under **Settings** for your VM. + - As an example, you could use the following settings to open the Port with + the following Inbound Rule settings: + - Source: Any + - Source Port Ranges: \* + - Destination: Any + - Destination Port Ranges: 6006 + - Protocol: Any + - Action: Allow + - Priority: (Leave as default) + +2. Unless you started the training as a background process, connect to your VM + from another terminal instance. +3. Run the following command from your terminal + `tensorboard --logdir results --host 0.0.0.0` +4. You should now be able to open a browser and navigate to + `:6060` to view the TensorBoard report. + +## Running on Azure Container Instances + +[Azure Container Instances](https://azure.microsoft.com/en-us/products/container-instances/) +allow you to spin up a container, on demand, that will run your training and +then be shut down. This ensures you aren't leaving a billable VM running when it +isn't needed. Using ACI enables you to offload training of your models without +needing to install Python and TensorFlow on your own computer. + +## Custom Instances + +This page contains instructions for setting up a custom Virtual Machine on +Microsoft Azure so you can running ML-Agents training in the cloud. + +1. Start by + [deploying an Azure VM](https://docs.microsoft.com/azure/virtual-machines/linux/quick-create-portal) + with Ubuntu Linux (tests were done with 16.04 LTS). To use GPU support, use a + N-Series VM. +2. SSH into your VM. +3. Start with the following commands to install the Nvidia driver: + + ```sh + wget http://us.download.nvidia.com/tesla/375.66/nvidia-diag-driver-local-repo-ubuntu1604_375.66-1_amd64.deb + + sudo dpkg -i nvidia-diag-driver-local-repo-ubuntu1604_375.66-1_amd64.deb + + sudo apt-get update + + sudo apt-get install cuda-drivers + + sudo reboot + ``` + +4. After a minute you should be able to reconnect to your VM and install the + CUDA toolkit: + + ```sh + wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.61-1_amd64.deb + + sudo dpkg -i cuda-repo-ubuntu1604_8.0.61-1_amd64.deb + + sudo apt-get update + + sudo apt-get install cuda-8-0 + ``` + +5. You'll next need to download cuDNN from the Nvidia developer site. This + requires a registered account. + +6. Navigate to [http://developer.nvidia.com](http://developer.nvidia.com) and + create an account and verify it. + +7. Download (to your own computer) cuDNN from + [this url](https://developer.nvidia.com/compute/machine-learning/cudnn/secure/v6/prod/8.0_20170307/Ubuntu16_04_x64/libcudnn6_6.0.20-1+cuda8.0_amd64-deb). + +8. Copy the deb package to your VM: + + ```sh + scp libcudnn6_6.0.21-1+cuda8.0_amd64.deb @:libcudnn6_6.0.21-1+cuda8.0_amd64.deb + ``` + +9. SSH back to your VM and execute the following: + + ```console + sudo dpkg -i libcudnn6_6.0.21-1+cuda8.0_amd64.deb + + export LD_LIBRARY_PATH=/usr/local/cuda/lib64/:/usr/lib/x86_64-linux-gnu/:$LD_LIBRARY_PATH + . ~/.profile + + sudo reboot + ``` + +10. After a minute, you should be able to SSH back into your VM. After doing so, + run the following: + + ```sh + sudo apt install python-pip + sudo apt install python3-pip + ``` + +11. At this point, you need to install TensorFlow. The version you install + should be tied to if you are using GPU to train: + + ```sh + pip3 install tensorflow-gpu==1.4.0 keras==2.0.6 + ``` + + Or CPU to train: + + ```sh + pip3 install tensorflow==1.4.0 keras==2.0.6 + ``` + +12. You'll then need to install additional dependencies: + + ```sh + pip3 install pillow + pip3 install numpy + ``` diff --git a/com.unity.ml-agents/Documentation~/Tutorial-Custom-Trainer-Plugin.md b/com.unity.ml-agents/Documentation~/Tutorial-Custom-Trainer-Plugin.md new file mode 100644 index 0000000000..06e9d2bc0e --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Tutorial-Custom-Trainer-Plugin.md @@ -0,0 +1,304 @@ +# Custom Trainer Plugin + +## How to write a custom trainer plugin + +### Step 1: Write your custom trainer class +Before you start writing your code, make sure to use your favorite environment management tool(e.g. `venv` or `conda`) to create and activate a Python virtual environment. The following command uses `conda`, but other tools work similarly: +```shell +conda create -n trainer-env python=3.10.12 +conda activate trainer-env +``` + +Users of the plug-in system are responsible for implementing the trainer class subject to the API standard. Let us follow an example by implementing a custom trainer named "YourCustomTrainer". You can either extend `OnPolicyTrainer` or `OffPolicyTrainer` classes depending on the training strategies you choose. + +Please refer to the internal [PPO implementation](../ml-agents/mlagents/trainers/ppo/trainer.py) for a complete code example. We will not provide a workable code in the document. The purpose of the tutorial is to introduce you to the core components and interfaces of our plugin framework. We use code snippets and patterns to demonstrate the control and data flow. + +Your custom trainers are responsible for collecting experiences and training the models. Your custom trainer class acts like a coordinator to the policy and optimizer. To start implementing methods in the class, create a policy class objects from method `create_policy`: + + +```python +def create_policy( + self, parsed_behavior_id: BehaviorIdentifiers, behavior_spec: BehaviorSpec +) -> TorchPolicy: + + actor_cls: Union[Type[SimpleActor], Type[SharedActorCritic]] = SimpleActor + actor_kwargs: Dict[str, Any] = { + "conditional_sigma": False, + "tanh_squash": False, + } + if self.shared_critic: + reward_signal_configs = self.trainer_settings.reward_signals + reward_signal_names = [ + key.value for key, _ in reward_signal_configs.items() + ] + actor_cls = SharedActorCritic + actor_kwargs.update({"stream_names": reward_signal_names}) + + policy = TorchPolicy( + self.seed, + behavior_spec, + self.trainer_settings.network_settings, + actor_cls, + actor_kwargs, + ) + return policy + +``` + +Depending on whether you use shared or separate network architecture for your policy, we provide `SimpleActor` and `SharedActorCritic` from `mlagents.trainers.torch_entities.networks` that you can choose from. In our example above, we use a `SimpleActor`. + +Next, create an optimizer class object from `create_optimizer` method and connect it to the policy object you created above: + + +```python +def create_optimizer(self) -> TorchOptimizer: + return TorchPPOOptimizer( # type: ignore + cast(TorchPolicy, self.policy), self.trainer_settings # type: ignore + ) # type: ignore + +``` + +There are a couple of abstract methods(`_process_trajectory` and `_update_policy`) inherited from `RLTrainer` that you need to implement in your custom trainer class. `_process_trajectory` takes a trajectory and processes it, putting it into the update buffer. Processing involves calculating value and advantage targets for the model updating step. Given input `trajectory: Trajectory`, users are responsible for processing the data in the trajectory and append `agent_buffer_trajectory` to the back of the update buffer by calling `self._append_to_update_buffer(agent_buffer_trajectory)`, whose output will be used in updating the model in `optimizer` class. + +A typical `_process_trajectory` function(incomplete) will convert a trajectory object to an agent buffer then get all value estimates from the trajectory by calling `self.optimizer.get_trajectory_value_estimates`. From the returned dictionary of value estimates we extract reward signals keyed by their names: + +```python +def _process_trajectory(self, trajectory: Trajectory) -> None: + super()._process_trajectory(trajectory) + agent_id = trajectory.agent_id # All the agents should have the same ID + + agent_buffer_trajectory = trajectory.to_agentbuffer() + + # Get all value estimates + ( + value_estimates, + value_next, + value_memories, + ) = self.optimizer.get_trajectory_value_estimates( + agent_buffer_trajectory, + trajectory.next_obs, + trajectory.done_reached and not trajectory.interrupted, + ) + + for name, v in value_estimates.items(): + agent_buffer_trajectory[RewardSignalUtil.value_estimates_key(name)].extend( + v + ) + self._stats_reporter.add_stat( + f"Policy/{self.optimizer.reward_signals[name].name.capitalize()} Value Estimate", + np.mean(v), + ) + + # Evaluate all reward functions + self.collected_rewards["environment"][agent_id] += np.sum( + agent_buffer_trajectory[BufferKey.ENVIRONMENT_REWARDS] + ) + for name, reward_signal in self.optimizer.reward_signals.items(): + evaluate_result = ( + reward_signal.evaluate(agent_buffer_trajectory) * reward_signal.strength + ) + agent_buffer_trajectory[RewardSignalUtil.rewards_key(name)].extend( + evaluate_result + ) + # Report the reward signals + self.collected_rewards[name][agent_id] += np.sum(evaluate_result) + + self._append_to_update_buffer(agent_buffer_trajectory) + +``` + +A trajectory will be a list of dictionaries of strings mapped to `Anything`. When calling `forward` on a policy, the argument will include an “experience” dictionary from the last step. The `forward` method will generate an action and the next “experience” dictionary. Examples of fields in the “experience” dictionary include observation, action, reward, done status, group_reward, LSTM memory state, etc. + + + +### Step 2: implement your custom optimizer for the trainer. +We will show you an example we implemented - `class TorchPPOOptimizer(TorchOptimizer)`, which takes a Policy and a Dict of trainer parameters and creates an Optimizer that connects to the policy. Your optimizer should include a value estimator and a loss function in the `update` method. + +Before writing your optimizer class, first define setting class `class PPOSettings(OnPolicyHyperparamSettings)` for your custom optimizer: + + + +```python +class PPOSettings(OnPolicyHyperparamSettings): + beta: float = 5.0e-3 + epsilon: float = 0.2 + lambd: float = 0.95 + num_epoch: int = 3 + shared_critic: bool = False + learning_rate_schedule: ScheduleType = ScheduleType.LINEAR + beta_schedule: ScheduleType = ScheduleType.LINEAR + epsilon_schedule: ScheduleType = ScheduleType.LINEAR + +``` + +You should implement `update` function following interface: + + +```python +def update(self, batch: AgentBuffer, num_sequences: int) -> Dict[str, float]: + +``` + +In which losses and other metrics are calculated from an `AgentBuffer` that is generated from your trainer class, depending on which model you choose to implement the loss functions will be different. In our case we calculate value loss from critic and trust region policy loss. A typical pattern(incomplete) of the calculations will look like the following: + + +```python +run_out = self.policy.actor.get_stats( + current_obs, + actions, + masks=act_masks, + memories=memories, + sequence_length=self.policy.sequence_length, +) + +log_probs = run_out["log_probs"] +entropy = run_out["entropy"] + +values, _ = self.critic.critic_pass( + current_obs, + memories=value_memories, + sequence_length=self.policy.sequence_length, +) +policy_loss = ModelUtils.trust_region_policy_loss( + ModelUtils.list_to_tensor(batch[BufferKey.ADVANTAGES]), + log_probs, + old_log_probs, + loss_masks, + decay_eps, +) +loss = ( + policy_loss + + 0.5 * value_loss + - decay_bet * ModelUtils.masked_mean(entropy, loss_masks) +) + +``` + +Finally update the model and return the a dictionary including calculated losses and updated decay learning rate: + + +```python +ModelUtils.update_learning_rate(self.optimizer, decay_lr) +self.optimizer.zero_grad() +loss.backward() + +self.optimizer.step() +update_stats = { + "Losses/Policy Loss": torch.abs(policy_loss).item(), + "Losses/Value Loss": value_loss.item(), + "Policy/Learning Rate": decay_lr, + "Policy/Epsilon": decay_eps, + "Policy/Beta": decay_bet, +} + +``` + +### Step 3: Integrate your custom trainer into the plugin system + +By integrating a custom trainer into the plugin system, a user can use their published packages which have their implementations. To do that, you need to add a setup.py file. In the call to setup(), you'll need to add to the entry_points dictionary for each plugin interface that you implement. The form of this is {entry point name}={plugin module}:{plugin function}. For example: + + + +```python +entry_points={ + ML_AGENTS_TRAINER_TYPE: [ + "your_trainer_type=your_package.your_custom_trainer:get_type_and_setting" + ] + }, +``` + +Some key elements in the code: + +``` +ML_AGENTS_TRAINER_TYPE: a string constant for trainer type +your_trainer_type: name your trainer type, used in configuration file +your_package: your pip installable package containing custom trainer implementation +``` + +Also define `get_type_and_setting` method in `YourCustomTrainer` class: + + +```python +def get_type_and_setting(): + return {YourCustomTrainer.get_trainer_name(): YourCustomTrainer}, { + YourCustomTrainer.get_trainer_name(): YourCustomSetting + } + +``` + +Finally, specify trainer type in the config file: + + +```python +behaviors: + 3DBall: + trainer_type: your_trainer_type +... +``` + +### Step 4: Install your custom trainer and run training: +Before installing your custom trainer package, make sure you have `ml-agents-env` and `ml-agents` installed + +```shell +pip3 install -e ./ml-agents-envs && pip3 install -e ./ml-agents +``` + +Install your custom trainer package(if your package is pip installable): +```shell +pip3 install your_custom_package +``` +Or follow our internal implementations: +```shell +pip3 install -e ./ml-agents-trainer-plugin +``` + +Following the previous installations your package is added as an entrypoint and you can use a config file with new +trainers: +```shell +mlagents-learn ml-agents-trainer-plugin/mlagents_trainer_plugin/a2c/a2c_3DBall.yaml --run-id +--env +``` + +### Validate your implementations: +Create a clean Python environment with Python 3.10.12 and activate it before you start, if you haven't done so already: +```shell +conda create -n trainer-env python=3.10.12 +conda activate trainer-env +``` + +Make sure you follow previous steps and install all required packages. We are testing internal implementations in this tutorial, but ML-Agents users can run similar validations once they have their own implementations installed: +```shell +pip3 install -e ./ml-agents-envs && pip3 install -e ./ml-agents +pip3 install -e ./ml-agents-trainer-plugin +``` +Once your package is added as an `entrypoint`, you can add to the config file the new trainer type. Check if trainer type is specified in the config file `a2c_3DBall.yaml`: +``` +trainer_type: a2c +``` + +Test if custom trainer package is installed by running: +```shell +mlagents-learn ml-agents-trainer-plugin/mlagents_trainer_plugin/a2c/a2c_3DBall.yaml --run-id test-trainer +``` + +You can also list all trainers installed in the registry. Type `python` in your shell to open a REPL session. Run the python code below, you should be able to see all trainer types currently installed: +```python +>>> import pkg_resources +>>> for entry in pkg_resources.iter_entry_points('mlagents.trainer_type'): +... print(entry) +... +default = mlagents.plugins.trainer_type:get_default_trainer_types +a2c = mlagents_trainer_plugin.a2c.a2c_trainer:get_type_and_setting +dqn = mlagents_trainer_plugin.dqn.dqn_trainer:get_type_and_setting +``` + +If it is properly installed, you will see Unity logo and message indicating training will start: +``` +[INFO] Listening on port 5004. Start training by pressing the Play button in the Unity Editor. +``` + +If you see the following error message, it could be due to trainer type is wrong or the trainer type specified is not installed: +```shell +mlagents.trainers.exception.TrainerConfigError: Invalid trainer type a2c was found +``` + diff --git a/com.unity.ml-agents/Documentation~/Unity-Environment-Registry.md b/com.unity.ml-agents/Documentation~/Unity-Environment-Registry.md new file mode 100644 index 0000000000..27f14561ed --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Unity-Environment-Registry.md @@ -0,0 +1,61 @@ +# Unity Environment Registry [Experimental] + +The Unity Environment Registry is a database of pre-built Unity environments that can be easily used without having to install the Unity Editor. It is a great way to get started with our [UnityEnvironment API](Python-LLAPI.md). + +## Loading an Environment from the Registry + +To get started, you can access the default registry we provide with our [Example Environments](Learning-Environment-Examples.md). The Unity Environment Registry implements a _Mapping_, therefore, you can access an entry with its identifier with the square brackets `[ ]`. Use the following code to list all of the environment identifiers present in the default registry: + +```python +from mlagents_envs.registry import default_registry + +environment_names = list(default_registry.keys()) +for name in environment_names: + print(name) +``` + +The `make()` method on a registry value will return a `UnityEnvironment` ready to be used. All arguments passed to the make method will be passed to the constructor of the `UnityEnvironment` as well. Refer to the documentation on the [Python-API](Python-LLAPI.md) for more information about the arguments of the `UnityEnvironment` constructor. For example, the following code will create the environment under the identifier `"my-env"`, reset it, perform a few steps and finally close it: + +```python +from mlagents_envs.registry import default_registry + +env = default_registry["my-env"].make() +env.reset() +for _ in range(10): + env.step() +env.close() +``` + +## Create and share your own registry + +In order to share the `UnityEnvironment` you created, you must: + + - [Create a Unity executable](Learning-Environment-Executable.md) of your environment for each platform (Linux, OSX and/or Windows) + - Place each executable in a `zip` compressed folder + - Upload each zip file online to your preferred hosting platform + - Create a `yaml` file that will contain the description and path to your environment + - Upload the `yaml` file online +The `yaml` file must have the following format : + +```yaml +environments: + - : + expected_reward: + description: + linux_url: + darwin_url: + win_url: + additional_args: + - + - ... +``` + +Your users can now use your environment with the following code : +```python +from mlagents_envs.registry import UnityEnvRegistry + +registry = UnityEnvRegistry() +registry.register_from_yaml("url-or-path-to-your-yaml-file") +``` + __Note__: The `"url-or-path-to-your-yaml-file"` can be either a url or a local path. + diff --git a/com.unity.ml-agents/Documentation~/Using-Docker.md b/com.unity.ml-agents/Documentation~/Using-Docker.md new file mode 100644 index 0000000000..000478f345 --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Using-Docker.md @@ -0,0 +1,177 @@ +# Using Docker For ML-Agents (Deprecated) + +:warning: **Note:** We no longer use this guide ourselves and so it may not work +correctly. We've decided to keep it up just in case it is helpful to you. + +We currently offer a solution for Windows and Mac users who would like to do +training or inference using Docker. This option may be appealing to those who +would like to avoid installing Python and TensorFlow themselves. The current +setup forces both TensorFlow and Unity to _only_ rely on the CPU for +computations. Consequently, our Docker simulation does not use a GPU and uses +[`Xvfb`](https://en.wikipedia.org/wiki/Xvfb) to do visual rendering. `Xvfb` is a +utility that enables `ML-Agents` (or any other application) to do rendering +virtually i.e. it does not assume that the machine running `ML-Agents` has a GPU +or a display attached to it. This means that rich environments which involve +agents using camera-based visual observations might be slower. + +## Requirements + +- [Docker](https://www.docker.com) +- Unity _Linux Build Support_ Component. Make sure to select the _Linux Build + Support_ component when installing Unity. + +

+ Linux Build Support +

+ +## Setup + +- [Download](https://unity3d.com/get-unity/download) the Unity Installer and add + the _Linux Build Support_ Component + +- [Download](https://www.docker.com/community-edition#/download) and install + Docker if you don't have it setup on your machine. + +- Since Docker runs a container in an environment that is isolated from the host + machine, a mounted directory in your host machine is used to share data, e.g. + the trainer configuration file, Unity executable and + TensorFlow graph. For convenience, we created an empty `unity-volume` + directory at the root of the repository for this purpose, but feel free to use + any other directory. The remainder of this guide assumes that the + `unity-volume` directory is the one used. + +## Usage + +Using Docker for ML-Agents involves three steps: building the Unity environment +with specific flags, building a Docker container and, finally, running the +container. If you are not familiar with building a Unity environment for +ML-Agents, please read through our +[Getting Started with the 3D Balance Ball Example](Getting-Started.md) guide +first. + +### Build the Environment (Optional) + +_If you want to used the Editor to perform training, you can skip this step._ + +Since Docker typically runs a container sharing a (linux) kernel with the host +machine, the Unity environment **has** to be built for the **linux platform**. +When building a Unity environment, please select the following options from the +the Build Settings window: + +- Set the _Target Platform_ to `Linux` +- Set the _Architecture_ to `x86_64` + +Then click `Build`, pick an environment name (e.g. `3DBall`) and set the output +directory to `unity-volume`. After building, ensure that the file +`.x86_64` and subdirectory `_Data/` are +created under `unity-volume`. + +![Build Settings For Docker](images/docker_build_settings.png) + +### Build the Docker Container + +First, make sure the Docker engine is running on your machine. Then build the +Docker container by calling the following command at the top-level of the +repository: + +```sh +docker build -t . +``` + +Replace `` with a name for the Docker image, e.g. +`balance.ball.v0.1`. + +### Run the Docker Container + +Run the Docker container by calling the following command at the top-level of +the repository: + +```sh +docker run -it --name \ + --mount type=bind,source="$(pwd)"/unity-volume,target=/unity-volume \ + -p 5005:5005 \ + -p 6006:6006 \ + :latest \ + \ + --env= \ + --train \ + --run-id= +``` + +Notes on argument values: + +- `` is used to identify the container (in case you want to + interrupt and terminate it). This is optional and Docker will generate a + random name if this is not set. _Note that this must be unique for every run + of a Docker image._ +- `` references the image name used when building the container. +- `` **(Optional)**: If you are training with a linux + executable, this is the name of the executable. If you are training in the + Editor, do not pass a `` argument and press the **Play** + button in Unity when the message _"Start training by pressing the Play button + in the Unity Editor"_ is displayed on the screen. +- `source`: Reference to the path in your host OS where you will store the Unity + executable. +- `target`: Tells Docker to mount the `source` path as a disk with this name. +- `trainer-config-file`, `train`, `run-id`: ML-Agents arguments passed to + `mlagents-learn`. `trainer-config-file` is the filename of the trainer config + file, `train` trains the algorithm, and `run-id` is used to tag each + experiment with a unique identifier. We recommend placing the trainer-config + file inside `unity-volume` so that the container has access to the file. + +To train with a `3DBall` environment executable, the command would be: + +```sh +docker run -it --name 3DBallContainer.first.trial \ + --mount type=bind,source="$(pwd)"/unity-volume,target=/unity-volume \ + -p 5005:5005 \ + -p 6006:6006 \ + balance.ball.v0.1:latest 3DBall \ + /unity-volume/trainer_config.yaml \ + --env=/unity-volume/3DBall \ + --train \ + --run-id=3dball_first_trial +``` + +For more detail on Docker mounts, check out +[these](https://docs.docker.com/storage/bind-mounts/) docs from Docker. + +**NOTE** If you are training using docker for environments that use visual +observations, you may need to increase the default memory that Docker allocates +for the container. For example, see +[here](https://docs.docker.com/docker-for-mac/#advanced) for instructions for +Docker for Mac. + +### Running Tensorboard + +You can run Tensorboard to monitor your training instance on +http://localhost:6006: + +```sh +docker exec -it tensorboard --logdir /unity-volume/results --host 0.0.0.0 +``` + +With our previous 3DBall example, this command would look like this: + +```sh +docker exec -it 3DBallContainer.first.trial tensorboard --logdir /unity-volume/results --host 0.0.0.0 +``` + +For more details on Tensorboard, check out the documentation about +[Using Tensorboard](Using-Tensorboard.md). + +### Stopping Container and Saving State + +If you are satisfied with the training progress, you can stop the Docker +container while saving state by either using `Ctrl+C` or `⌘+C` (Mac) or by using +the following command: + +```sh +docker kill --signal=SIGINT +``` + +`` is the name of the container specified in the earlier +`docker run` command. If you didn't specify one, you can find the randomly +generated identifier by running `docker container ls`. diff --git a/com.unity.ml-agents/Documentation~/Using-Tensorboard.md b/com.unity.ml-agents/Documentation~/Using-Tensorboard.md new file mode 100644 index 0000000000..d1bf469e91 --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Using-Tensorboard.md @@ -0,0 +1,136 @@ +# Using TensorBoard to Observe Training + +The ML-Agents Toolkit saves statistics during learning session that you can view +with a TensorFlow utility named, +[TensorBoard](https://www.tensorflow.org/tensorboard). + +The `mlagents-learn` command saves training statistics to a folder named +`results`, organized by the `run-id` value you assign to a training session. + +In order to observe the training process, either during training or afterward, +start TensorBoard: + +1. Open a terminal or console window: +1. Navigate to the directory where the ML-Agents Toolkit is installed. +1. From the command line run: `tensorboard --logdir results --port 6006` +1. Open a browser window and navigate to + [localhost:6006](http://localhost:6006). + +**Note:** The default port TensorBoard uses is 6006. If there is an existing +session running on port 6006 a new session can be launched on an open port using +the --port option. + +**Note:** If you don't assign a `run-id` identifier, `mlagents-learn` uses the +default string, "ppo". You can delete the folders under the `results` directory +to clear out old statistics. + +On the left side of the TensorBoard window, you can select which of the training +runs you want to display. You can select multiple run-ids to compare statistics. +The TensorBoard window also provides options for how to display and smooth +graphs. + +## The ML-Agents Toolkit training statistics + +The ML-Agents training program saves the following statistics: + +![Example TensorBoard Run](images/mlagents-TensorBoard.png) + +### Environment Statistics + +- `Environment/Lesson` - Plots the progress from lesson to lesson. Only + interesting when performing curriculum training. + +- `Environment/Cumulative Reward` - The mean cumulative episode reward over all + agents. Should increase during a successful training session. + +- `Environment/Episode Length` - The mean length of each episode in the + environment for all agents. + +### Is Training + +- `Is Training` - A boolean indicating if the agent is updating its model. + +### Policy Statistics + +- `Policy/Entropy` (PPO; SAC) - How random the decisions of the model are. + Should slowly decrease during a successful training process. If it decreases + too quickly, the `beta` hyperparameter should be increased. + +- `Policy/Learning Rate` (PPO; SAC) - How large a step the training algorithm + takes as it searches for the optimal policy. Should decrease over time. + +- `Policy/Entropy Coefficient` (SAC) - Determines the relative importance of the + entropy term. This value is adjusted automatically so that the agent retains + some amount of randomness during training. + +- `Policy/Extrinsic Reward` (PPO; SAC) - This corresponds to the mean cumulative + reward received from the environment per-episode. + +- `Policy/Value Estimate` (PPO; SAC) - The mean value estimate for all states + visited by the agent. Should increase during a successful training session. + +- `Policy/Curiosity Reward` (PPO/SAC+Curiosity) - This corresponds to the mean + cumulative intrinsic reward generated per-episode. + +- `Policy/Curiosity Value Estimate` (PPO/SAC+Curiosity) - The agent's value + estimate for the curiosity reward. + +- `Policy/GAIL Reward` (PPO/SAC+GAIL) - This corresponds to the mean cumulative + discriminator-based reward generated per-episode. + +- `Policy/GAIL Value Estimate` (PPO/SAC+GAIL) - The agent's value estimate for + the GAIL reward. + +- `Policy/GAIL Policy Estimate` (PPO/SAC+GAIL) - The discriminator's estimate + for states and actions generated by the policy. + +- `Policy/GAIL Expert Estimate` (PPO/SAC+GAIL) - The discriminator's estimate + for states and actions drawn from expert demonstrations. + +### Learning Loss Functions + +- `Losses/Policy Loss` (PPO; SAC) - The mean magnitude of policy loss function. + Correlates to how much the policy (process for deciding actions) is changing. + The magnitude of this should decrease during a successful training session. + +- `Losses/Value Loss` (PPO; SAC) - The mean loss of the value function update. + Correlates to how well the model is able to predict the value of each state. + This should increase while the agent is learning, and then decrease once the + reward stabilizes. + +- `Losses/Forward Loss` (PPO/SAC+Curiosity) - The mean magnitude of the forward + model loss function. Corresponds to how well the model is able to predict the + new observation encoding. + +- `Losses/Inverse Loss` (PPO/SAC+Curiosity) - The mean magnitude of the inverse + model loss function. Corresponds to how well the model is able to predict the + action taken between two observations. + +- `Losses/Pretraining Loss` (BC) - The mean magnitude of the behavioral cloning + loss. Corresponds to how well the model imitates the demonstration data. + +- `Losses/GAIL Loss` (GAIL) - The mean magnitude of the GAIL discriminator loss. + Corresponds to how well the model imitates the demonstration data. + +### Self-Play + +- `Self-Play/ELO` (Self-Play) - + [ELO](https://en.wikipedia.org/wiki/Elo_rating_system) measures the relative + skill level between two players. In a proper training run, the ELO of the + agent should steadily increase. + +## Exporting Data from TensorBoard +To export timeseries data in CSV or JSON format, check the "Show data download +links" in the upper left. This will enable download links below each chart. + +![Example TensorBoard Run](images/TensorBoard-download.png) + +## Custom Metrics from Unity + +To get custom metrics from a C# environment into TensorBoard, you can use the +`StatsRecorder`: + +```csharp +var statsRecorder = Academy.Instance.StatsRecorder; +statsRecorder.Add("MyMetric", 1.0); +``` diff --git a/com.unity.ml-agents/Documentation~/Using-Virtual-Environment.md b/com.unity.ml-agents/Documentation~/Using-Virtual-Environment.md new file mode 100644 index 0000000000..6a3415ad31 --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Using-Virtual-Environment.md @@ -0,0 +1,73 @@ +# Using Virtual Environment + +## What is a Virtual Environment? + +A Virtual Environment is a self contained directory tree that contains a Python +installation for a particular version of Python, plus a number of additional +packages. To learn more about Virtual Environments see +[here](https://docs.python.org/3/library/venv.html). + +## Why should I use a Virtual Environment? + +A Virtual Environment keeps all dependencies for the Python project separate +from dependencies of other projects. This has a few advantages: + +1. It makes dependency management for the project easy. +1. It enables using and testing of different library versions by quickly + spinning up a new environment and verifying the compatibility of the code + with the different version. + +## Python Version Requirement (Required) +This guide has been tested with Python 3.10.12. Newer versions might not +have support for the dependent libraries, so are not recommended. + +## Use Conda (or Mamba) + +While there are many options for setting up virtual environments for python, by far the most common and simpler approach is by using Anaconda (aka Conda). You can read the documentation on how to get started with Conda [here](https://learning.anaconda.cloud/get-started-with-anaconda). + +## Installing Pip (Required) + +1. Download the `get-pip.py` file using the command + `curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py` +1. Run the following `python3 get-pip.py` +1. Check pip version using `pip3 -V` + +Note (for Ubuntu users): If the +`ModuleNotFoundError: No module named 'distutils.util'` error is encountered, +then python3-distutils needs to be installed. Install python3-distutils using +`sudo apt-get install python3-distutils` + +## Mac OS X Setup + +1. Create a folder where the virtual environments will reside + `$ mkdir ~/python-envs` +1. To create a new environment named `sample-env` execute + `$ python3 -m venv ~/python-envs/sample-env` +1. To activate the environment execute + `$ source ~/python-envs/sample-env/bin/activate` +1. Upgrade to the latest pip version using `$ pip3 install --upgrade pip` +1. Upgrade to the latest setuptools version using + `$ pip3 install --upgrade setuptools` +1. To deactivate the environment execute `$ deactivate` (you can reactivate the + environment using the same `activate` command listed above) + +## Ubuntu Setup + +1. Install the python3-venv package using `$ sudo apt-get install python3-venv` +1. Follow the steps in the Mac OS X installation. + +## Windows Setup + +1. Create a folder where the virtual environments will reside `md python-envs` +1. To create a new environment named `sample-env` execute + `python -m venv python-envs\sample-env` +1. To activate the environment execute `python-envs\sample-env\Scripts\activate` +1. Upgrade to the latest pip version using `pip install --upgrade pip` +1. To deactivate the environment execute `deactivate` (you can reactivate the + environment using the same `activate` command listed above) + +Note: +- Verify that you are using Python version 3.10.12. Launch a + command prompt using `cmd` and execute `python --version` to verify the version. +- Python3 installation may require admin privileges on Windows. +- This guide is for Windows 10 using a 64-bit architecture only. diff --git a/com.unity.ml-agents/Documentation~/Versioning.md b/com.unity.ml-agents/Documentation~/Versioning.md new file mode 100644 index 0000000000..0f1db0a062 --- /dev/null +++ b/com.unity.ml-agents/Documentation~/Versioning.md @@ -0,0 +1,95 @@ +# ML-Agents Versioning + +## Context +As the ML-Agents project evolves into a more mature product, we want to communicate the process +we use to version our packages and the data that flows into, through, and out of them clearly. +Our project now has four packages (1 Unity, 3 Python) along with artifacts that are produced as +well as consumed. This document covers the versioning for these packages and artifacts. + +## GitHub Releases +Up until now, all packages were in lockstep in-terms of versioning. As a result, the GitHub releases +were tagged with the version of all those packages (e.g. v0.15.0, v0.15.1) and labeled accordingly. +With the decoupling of package versions, we now need to revisit our GitHub release tagging. +The proposal is that we move towards an integer release numbering for our repo and each such +release will call out specific version upgrades of each package. For instance, with +[the April 30th release](https://github.com/Unity-Technologies/ml-agents/releases/tag/release_1), +we will have: +- GitHub Release 1 (branch name: *release_1_branch*) + - com.unity.ml-agents release 1.0.0 + - ml-agents release 0.16.0 + - ml-agents-envs release 0.16.0 + - gym-unity release 0.16.0 + +Our release cadence will not be affected by these versioning changes. We will keep having +monthly releases to fix bugs and release new features. + +## Packages +All of the software packages, and their generated artifacts will be versioned. Any automation +tools will not be versioned. + +### Unity package +Package name: com.unity.ml-agents +- Versioned following [Semantic Versioning Guidelines](https://www.semver.org) +- This package consumes an artifact of the training process: the `.nn` file. These files + are integer versioned and currently at version 2. The com.unity.ml-agents package + will need to support the version of `.nn` files which existed at its 1.0.0 release. + For example, consider that com.unity.ml-agents is at version 1.0.0 and the NN files + are at version 2. If the NN files change to version 3, the next release of + com.unity.ml-agents at version 1.1.0 guarantees it will be able to read both of these + formats. If the NN files were to change to version 4 and com.unity.ml-agents to + version 2.0.0, support for NN versions 2 and 3 could be dropped for com.unity.ml-agents + version 2.0.0. +- This package produces one artifact, the `.demo` files. These files will have integer + versioning. This means their version will increment by 1 at each change. The + com.unity.ml-agents package must be backward compatible with version changes + that occur between minor versions. +- To summarize, the artifacts produced and consumed by com.unity.ml-agents are guaranteed + to be supported for 1.x.x versions of com.unity.ml-agents. We intend to provide stability + for our users by moving to a 1.0.0 release of com.unity.ml-agents. + + +### Python Packages +Package names: ml-agents / ml-agents-envs / gym-unity +- The python packages remain in "Beta." This means that breaking changes to the public + API of the python packages can change without having to have a major version bump. + Historically, the python and C# packages were in version lockstep. This is no longer + the case. The python packages will remain in lockstep with each other for now, while the + C# package will follow its own versioning as is appropriate. However, the python package + versions may diverge in the future. +- While the python packages will remain in Beta for now, we acknowledge that the most + heavily used portion of our python interface is the `mlagents-learn` CLI and strive + to make this part of our API backward compatible. We are actively working on this and + expect to have a stable CLI in the next few weeks. + +## Communicator + +Packages which communicate: com.unity.ml-agents / ml-agents-envs + +Another entity of the ML-Agents Toolkit that requires versioning is the communication layer +between C# and Python, which will follow also semantic versioning. This guarantees a level of +backward compatibility between different versions of C# and Python packages which communicate. +Any Communicator version 1.x.x of the Unity package should be compatible with any 1.x.x +Communicator Version in Python. + +An RLCapabilities struct keeps track of which features exist. This struct is passed from C# to +Python, and another from Python to C#. With this feature level granularity, we can notify users +more specifically about feature limitations based on what's available in both C# and Python. +These notifications will be logged to the python terminal, or to the Unity Editor Console. + + +## Side Channels + +The communicator is what manages data transfer between Unity and Python for the core +training loop. Side Channels are another means of data transfer between Unity and Python. +Side Channels are not versioned, but have been designed to support backward compatibility +for what they are. As of today, we provide 4 side channels: +- FloatProperties: shared float data between Unity - Python (bidirectional) +- RawBytes: raw data that can be sent Unity - Python (bidirectional) +- EngineConfig: a set of numeric fields in a pre-defined order sent from Python to Unity +- Stats: (name, value, agg) messages sent from Unity to Python + +Aside from the specific implementations of side channels we provide (and use ourselves), +the Side Channel interface is made available for users to create their own custom side +channels. As such, we guarantee that the built in SideChannel interface between Unity and +Python is backward compatible in packages that share the same major version. + diff --git a/com.unity.ml-agents/Documentation~/com.unity.ml-agents.md b/com.unity.ml-agents/Documentation~/com.unity.ml-agents.md index 959f5edb75..f05e083122 100644 --- a/com.unity.ml-agents/Documentation~/com.unity.ml-agents.md +++ b/com.unity.ml-agents/Documentation~/com.unity.ml-agents.md @@ -1,7 +1,10 @@ -# ML-Agents Overview +# Unity ML-Agents Toolkit + + + ML-agents enable games and simulations to serve as environments for training intelligent agents in Unity. Training can be done with reinforcement learning, imitation learning, neuroevolution, or any other methods. Trained agents can be used for many use cases, including controlling NPC behavior (in a variety of settings such as multi-agent and adversarial), automated testing of game builds and evaluating different game design decisions pre-release. -The _ML-Agents_ package has a C# SDK for the [Unity ML-Agents Toolkit], which can be used outside of Unity. The scope of these docs is just to get started in the context of Unity, but further details and samples are located on the [github docs]. +The _ML-Agents_ package has a C# SDK for the [Unity ML-Agents Toolkit], which can be used outside of Unity. This package now contains comprehensive documentation including detailed training guides, Python API documentation, and advanced features. ## Capabilities The package allows you to convert any Unity scene into a learning environment and train character behaviors using a variety of machine-learning algorithms. Additionally, it allows you to embed these trained behaviors back into Unity scenes to control your characters. More specifically, the package provides the following core functionalities: @@ -12,7 +15,7 @@ The package allows you to convert any Unity scene into a learning environment an * Embed a trained behavior (aka: run your ML model) in the scene via the [Unity Inference Engine]. Embedded behaviors allow you to switch an Agent between learning and inference. ## Special Notes -Note that the ML-Agents package does not contain the machine learning algorithms for training behaviors. The ML-Agents package only supports instrumenting a Unity scene, setting it up for training, and then embedding the trained model back into your Unity scene. The machine learning algorithms that orchestrate training are part of the companion [python package]. +Note that the ML-Agents package does not contain the machine learning algorithms for training behaviors. The ML-Agents package only supports instrumenting a Unity scene, setting it up for training, and then embedding the trained model back into your Unity scene. The machine learning algorithms that orchestrate training are part of the companion Python package. For detailed training instructions, see [Training ML-Agents](Training-ML-Agents.md). ## Package contents @@ -38,116 +41,7 @@ To add the ML-Agents package to a Unity project: * Enter com.unity.ml-agents *Click Add to add the package to your project. -To install the companion Python package to enable training behaviors, follow the [installation instructions] on our [GitHub repository]. - -## Advanced Features - -### Custom Grid Sensors - -Grid Sensor provides a 2D observation that detects objects around an agent from a top-down view. Compared to RayCasts, it receives a full observation in a grid area without gaps, and the detection is not blocked by objects around the agents. This gives a more granular view while requiring a higher usage of compute resources. - -One extra feature with Grid Sensors is that you can derive from the Grid Sensor base class to collect custom data besides the object tags, to include custom attributes as observations. This allows more flexibility for the use of GridSensor. - -#### Creating Custom Grid Sensors -To create a custom grid sensor, you'll need to derive from two classes: `GridSensorBase` and `GridSensorComponent`. - -##### Deriving from `GridSensorBase` -This is the implementation of your sensor. This defines how your sensor process detected colliders, -what the data looks like, and how the observations are constructed from the detected objects. -Consider overriding the following methods depending on your use case: -* `protected virtual int GetCellObservationSize()`: Return the observation size per cell. Default to `1`. -* `protected virtual void GetObjectData(GameObject detectedObject, int tagIndex, float[] dataBuffer)`: Constructs observations from the detected object. The input provides the detected GameObject and the index of its tag (0-indexed). The observations should be written to the given `dataBuffer` and the buffer size is defined in `GetCellObservationSize()`. This data will be gathered from each cell and sent to the trainer as observation. -* `protected virtual bool IsDataNormalized()`: Return whether the observation is normalized to 0~1. This affects whether you're able to use compressed observations as compressed data only supports normalized data. Return `true` if all the values written in `GetObjectData` are within the range of (0, 1), otherwise return `false`. Default to `false`. - - There might be cases when your data is not in the range of (0, 1) but you still wish to use compressed data to speed up training. If your data is naturally bounded within a range, normalize your data first to the possible range and fill the buffer with normalized data. For example, since the angle of rotation is bounded within `0 ~ 360`, record an angle `x` as `x/360` instead of `x`. If your data value is not bounded (position, velocity, etc.), consider setting a reasonable min/max value and use that to normalize your data. -* `protected internal virtual ProcessCollidersMethod GetProcessCollidersMethod()`: Return the method to process colliders detected in a cell. This defines the sensor behavior when multiple objects with detectable tags are detected within a cell. -Currently two methods are provided: - * `ProcessCollidersMethod.ProcessClosestColliders` (Default): Process the closest collider to the agent. In this case each cell's data is represented by one object. - * `ProcessCollidersMethod.ProcessAllColliders`: Process all detected colliders. This is useful when the data from each cell is additive, for instance, the count of detected objects in a cell. When using this option, the input `dataBuffer` in `GetObjectData()` will contain processed data from other colliders detected in the cell. You'll more likely want to add/subtract values from the buffer instead of overwrite it completely. - -##### Deriving from `GridSensorComponent` -To create your sensor, you need to override the sensor component and add your sensor to the creation. -Specifically, you need to override `GetGridSensors()` and return an array of grid sensors you want to use in the component. -It can be used to create multiple different customized grid sensors, or you can also include the ones provided in our package (listed in the next section). - -Example: -```csharp -public class CustomGridSensorComponent : GridSensorComponent -{ - protected override GridSensorBase[] GetGridSensors() - { - return new GridSensorBase[] { new CustomGridSensor(...)}; - } -} -``` - -#### Grid Sensor Types -Here we list out two types of grid sensor provided in the package: `OneHotGridSensor` and `CountingGridSensor`. -Their implementations are also a good reference for making you own ones. - -##### OneHotGridSensor -This is the default sensor used by `GridSensorComponent`. It detects objects with detectable tags and the observation is the one-hot representation of the detected tag index. - -The implementation of the sensor is defined as following: -* `GetCellObservationSize()`: `detectableTags.Length` -* `IsDataNormalized()`: `true` -* `ProcessCollidersMethod()`: `ProcessCollidersMethod.ProcessClosestColliders` -* `GetObjectData()`: - -```csharp -protected override void GetObjectData(GameObject detectedObject, int tagIndex, float[] dataBuffer) -{ - dataBuffer[tagIndex] = 1; -} -``` - -##### CountingGridSensor -This is an example of using all colliders detected in a cell. It counts the number of objects detected for each detectable tag. The sensor cannot be used with data compression. - -The implementation of the sensor is defined as following: -* `GetCellObservationSize()`: `detectableTags.Length` -* `IsDataNormalized()`: `false` -* `ProcessCollidersMethod()`: `ProcessCollidersMethod.ProcessAllColliders` -* `GetObjectData()`: - -```csharp -protected override void GetObjectData(GameObject detectedObject, int tagIndex, float[] dataBuffer) -{ - dataBuffer[tagIndex] += 1; -} -``` - -### Input System Integration - -The ML-Agents package integrates with the [Input System Package](https://docs.unity3d.com/Packages/com.unity.inputsystem@1.1/manual/QuickStartGuide.html) through the `InputActuatorComponent`. This component sets up an action space for your `Agent` based on an `InputActionAsset` that is referenced by the `IInputActionAssetProvider` interface, or the `PlayerInput` component that may be living on your player controlled `Agent`. This means that if you have code outside of your agent that handles input, you will not need to implement the Heuristic function in agent as well. The `InputActuatorComponent` will handle this for you. You can now train and run inference on `Agents` with an action space defined by an `InputActionAsset`. - -Take a look at how we have implemented the C# code in the example Input Integration scene (located under Project/Assets/ML-Agents/Examples/PushBlockWithInput/). Once you have some familiarity, then the next step would be to add the InputActuatorComponent to your player Agent. The example we have implemented uses C# Events to send information from the Input System. - -#### Getting Started with Input System Integration -1. Add the `com.unity.inputsystem` version 1.1.0-preview.3 or later to your project via the Package Manager window. -2. If you have already setup an InputActionAsset skip to Step 3, otherwise follow these sub steps: - 1. Create an InputActionAsset to allow your Agent to be controlled by the Input System. - 2. Handle the events from the Input System where you normally would (i.e. a script external to your Agent class). -3. Add the InputSystemActuatorComponent to the GameObject that has the `PlayerInput` and `Agent` components attached. - -Additionally, see below for additional technical specifications on the C# code for the InputActuatorComponent. -#### Technical Specifications - -##### `IInputActionsAssetProvider` Interface -The `InputActuatorComponent` searches for a `Component` that implements -`IInputActionAssetProvider` on the `GameObject` they both are attached to. It is important to note -that if multiple `Components` on your `GameObject` need to access an `InputActionAsset` to handle events, -they will need to share the same instance of the `InputActionAsset` that is returned from the -`IInputActionAssetProvider`. - -##### `InputActuatorComponent` Class -The `InputActuatorComponent` is the bridge between ML-Agents and the Input System. It allows ML-Agents to: -* create an `ActionSpec` for your Agent based on an `InputActionAsset` that comes from an -`IInputActionAssetProvider`. -* send simulated input from a training process or a neural network -* let developers keep their input handling code in one place - -This is accomplished by adding the `InputActuatorComponent` to an Agent which already has the PlayerInput component attached. +To install the companion Python package to enable training behaviors, follow the [Installation](Installation.md) instructions. ## Known Limitations @@ -172,26 +66,62 @@ Currently the speed of the game physics can only be increased to 100x real-time. You can control the frequency of Academy stepping by calling `Academy.Instance.DisableAutomaticStepping()`, and then calling `Academy.Instance.EnvironmentStep()`. -### Input System Integration - - For the `InputActuatorComponent` - - Limited implementation of `InputControls` - - No way to customize the action space of the `InputActuatorComponent` +## Complete Documentation + +Welcome to the comprehensive Unity ML-Agents Toolkit documentation. This documentation is now fully integrated within the Unity package for easy access. + +## Quick Navigation + +### Getting Started +- **[Getting Started](Getting-Started.md)** - Step-by-step setup guide +- **[Installation](Installation.md)** - Complete installation instructions +- **[ML-Agents Overview](ML-Agents-Overview.md)** - Comprehensive feature overview + +### Creating Learning Environments +- **[Learning Environment Design](Learning-Environment-Design.md)** - Design principles +- **[Creating New Learning Environments](Learning-Environment-Create-New.md)** - Step-by-step creation +- **[Designing Agents](Learning-Environment-Design-Agents.md)** - Agent design guide +- **[Learning Environment Examples](Learning-Environment-Examples.md)** - Example environments + +### Training & Configuration +- **[Training ML-Agents](Training-ML-Agents.md)** - Complete training guide +- **[Training Configuration File](Training-Configuration-File.md)** - Configuration reference +- **[Training Plugins](Training-Plugins.md)** - Extending training functionality +- **[Using Tensorboard](Using-Tensorboard.md)** - Monitoring training + +### Python APIs +- **[Python Gym API](Python-Gym-API.md)** - OpenAI Gym interface +- **[Python PettingZoo API](Python-PettingZoo-API.md)** - Multi-agent environments +- **[Python Low-Level API](Python-LLAPI.md)** - Low-level API access + +### Advanced Features +- **[Custom Side Channels](Custom-SideChannels.md)** - Custom communication +- **[Inference Engine](Inference-Engine.md)** - Running trained models +- **[Hugging Face Integration](Hugging-Face-Integration.md)** - Model sharing +- **[Custom Grid Sensors](Custom-GridSensors.md)** - Custom grid sensors +- **[Input System Integration](InputSystem-Integration.md)** - Unity Input System + +### Cloud & Deployment +- **[Using Docker](Using-Docker.md)** - Containerized training +- **[Amazon Web Services](Training-on-Amazon-Web-Service.md)** - AWS deployment +- **[Microsoft Azure](Training-on-Microsoft-Azure.md)** - Azure deployment + +### Reference & Support +- **[FAQ](FAQ.md)** - Frequently asked questions +- **[Limitations](Limitations.md)** - Known limitations +- **[Migrating](Migrating.md)** - Migration between versions +- **[Background: Machine Learning](Background-Machine-Learning.md)** - ML fundamentals +- **[Background: Unity](Background-Unity.md)** - Unity concepts +- **[Background: PyTorch](Background-PyTorch.md)** - PyTorch fundamentals ## Additional Resources -* [GitHub repository] -* [Unity Discussions] -* [Discord] -* [Website] +For the most up-to-date information and community support, visit our [GitHub repository](https://github.com/Unity-Technologies/ml-agents). + +* [Unity Discussions](https://discussions.unity.com/tag/ml-agents) +* [Discord](https://discord.com/channels/489222168727519232/1202574086115557446) +* [Website](https://unity-technologies.github.io/ml-agents/) -[github docs]: https://unity-technologies.github.io/ml-agents/ -[installation instructions]: https://github.com/Unity-Technologies/ml-agents/blob/release_22_docs/docs/Installation.md [Unity Inference Engine]: https://docs.unity3d.com/Packages/com.unity.ai.inference@2.2/manual/index.html -[python package]: https://github.com/Unity-Technologies/ml-agents -[GitHub repository]: https://github.com/Unity-Technologies/ml-agents [Execution Order of Event Functions]: https://docs.unity3d.com/Manual/ExecutionOrder.html -[Unity Discussions]: https://discussions.unity.com/tag/ml-agents -[Discord]: https://discord.com/channels/489222168727519232/1202574086115557446 -[Website]: https://unity-technologies.github.io/ml-agents/ diff --git a/com.unity.ml-agents/Documentation~/dox-ml-agents.conf b/com.unity.ml-agents/Documentation~/dox-ml-agents.conf new file mode 100644 index 0000000000..bfdb11ead7 --- /dev/null +++ b/com.unity.ml-agents/Documentation~/dox-ml-agents.conf @@ -0,0 +1,2444 @@ +# Doxyfile 1.8.13 + +# To generate the C# API documentation, run: +# +# doxygen dox-ml-agents.conf +# +# from the ml-agents-docs directory + +#--------------------------------------------------------------------------- +# Project related configuration options +#--------------------------------------------------------------------------- + +# This tag specifies the encoding used for all characters in the config file +# that follow. The default is UTF-8 which is also the encoding used for all text +# before the first occurrence of this tag. Doxygen uses libiconv (or the iconv +# built into libc) for the transcoding. See http://www.gnu.org/software/libiconv +# for the list of possible encodings. +# The default value is: UTF-8. + +DOXYFILE_ENCODING = UTF-8 + +# The PROJECT_NAME tag is a single word (or a sequence of words surrounded by +# double-quotes, unless you are using Doxywizard) that should identify the +# project for which the documentation is generated. This name is used in the +# title of most generated pages and in a few other places. +# The default value is: My Project. + +PROJECT_NAME = "Unity ML-Agents Toolkit" + +# The PROJECT_NUMBER tag can be used to enter a project or revision number. This +# could be handy for archiving the generated documentation or if some version +# control system is used. + +PROJECT_NUMBER = + +# Using the PROJECT_BRIEF tag one can provide an optional one line description +# for a project that appears at the top of each page and should give viewer a +# quick idea about the purpose of the project. Keep the description short. + +PROJECT_BRIEF = + +# With the PROJECT_LOGO tag one can specify a logo or an icon that is included +# in the documentation. The maximum height of the logo should not exceed 55 +# pixels and the maximum width should not exceed 200 pixels. Doxygen will copy +# the logo to the output directory. + +PROJECT_LOGO = doxygen/logo.png + +# The OUTPUT_DIRECTORY tag is used to specify the (relative or absolute) path +# into which the generated documentation will be written. If a relative path is +# entered, it will be relative to the location where doxygen was started. If +# left blank the current directory will be used. + +OUTPUT_DIRECTORY = + +# If the CREATE_SUBDIRS tag is set to YES then doxygen will create 4096 sub- +# directories (in 2 levels) under the output directory of each output format and +# will distribute the generated files over these directories. Enabling this +# option can be useful when feeding doxygen a huge amount of source files, where +# putting all generated files in the same directory would otherwise causes +# performance problems for the file system. +# The default value is: NO. + +CREATE_SUBDIRS = NO + +# If the ALLOW_UNICODE_NAMES tag is set to YES, doxygen will allow non-ASCII +# characters to appear in the names of generated files. If set to NO, non-ASCII +# characters will be escaped, for example _xE3_x81_x84 will be used for Unicode +# U+3044. +# The default value is: NO. + +ALLOW_UNICODE_NAMES = NO + +# The OUTPUT_LANGUAGE tag is used to specify the language in which all +# documentation generated by doxygen is written. Doxygen will use this +# information to generate all constant output in the proper language. +# Possible values are: Afrikaans, Arabic, Armenian, Brazilian, Catalan, Chinese, +# Chinese-Traditional, Croatian, Czech, Danish, Dutch, English (United States), +# Esperanto, Farsi (Persian), Finnish, French, German, Greek, Hungarian, +# Indonesian, Italian, Japanese, Japanese-en (Japanese with English messages), +# Korean, Korean-en (Korean with English messages), Latvian, Lithuanian, +# Macedonian, Norwegian, Persian (Farsi), Polish, Portuguese, Romanian, Russian, +# Serbian, Serbian-Cyrillic, Slovak, Slovene, Spanish, Swedish, Turkish, +# Ukrainian and Vietnamese. +# The default value is: English. + +OUTPUT_LANGUAGE = English + +# If the BRIEF_MEMBER_DESC tag is set to YES, doxygen will include brief member +# descriptions after the members that are listed in the file and class +# documentation (similar to Javadoc). Set to NO to disable this. +# The default value is: YES. + +BRIEF_MEMBER_DESC = YES + +# If the REPEAT_BRIEF tag is set to YES, doxygen will prepend the brief +# description of a member or function before the detailed description +# +# Note: If both HIDE_UNDOC_MEMBERS and BRIEF_MEMBER_DESC are set to NO, the +# brief descriptions will be completely suppressed. +# The default value is: YES. + +REPEAT_BRIEF = YES + +# This tag implements a quasi-intelligent brief description abbreviator that is +# used to form the text in various listings. Each string in this list, if found +# as the leading text of the brief description, will be stripped from the text +# and the result, after processing the whole list, is used as the annotated +# text. Otherwise, the brief description is used as-is. If left blank, the +# following values are used ($name is automatically replaced with the name of +# the entity):The $name class, The $name widget, The $name file, is, provides, +# specifies, contains, represents, a, an and the. + +ABBREVIATE_BRIEF = "The $name class" \ + "The $name widget" \ + "The $name file" \ + is \ + provides \ + specifies \ + contains \ + represents \ + a \ + an \ + the + +# If the ALWAYS_DETAILED_SEC and REPEAT_BRIEF tags are both set to YES then +# doxygen will generate a detailed section even if there is only a brief +# description. +# The default value is: NO. + +ALWAYS_DETAILED_SEC = NO + +# If the INLINE_INHERITED_MEMB tag is set to YES, doxygen will show all +# inherited members of a class in the documentation of that class as if those +# members were ordinary class members. Constructors, destructors and assignment +# operators of the base classes will not be shown. +# The default value is: NO. + +INLINE_INHERITED_MEMB = NO + +# If the FULL_PATH_NAMES tag is set to YES, doxygen will prepend the full path +# before files name in the file list and in the header files. If set to NO the +# shortest path that makes the file name unique will be used +# The default value is: YES. + +FULL_PATH_NAMES = NO + +# The STRIP_FROM_PATH tag can be used to strip a user-defined part of the path. +# Stripping is only done if one of the specified strings matches the left-hand +# part of the path. The tag can be used to show relative paths in the file list. +# If left blank the directory from which doxygen is run is used as the path to +# strip. +# +# Note that you can specify absolute paths here, but also relative paths, which +# will be relative from the directory where doxygen is started. +# This tag requires that the tag FULL_PATH_NAMES is set to YES. + +STRIP_FROM_PATH = + +# The STRIP_FROM_INC_PATH tag can be used to strip a user-defined part of the +# path mentioned in the documentation of a class, which tells the reader which +# header file to include in order to use a class. If left blank only the name of +# the header file containing the class definition is used. Otherwise one should +# specify the list of include paths that are normally passed to the compiler +# using the -I flag. + +STRIP_FROM_INC_PATH = + +# If the SHORT_NAMES tag is set to YES, doxygen will generate much shorter (but +# less readable) file names. This can be useful is your file systems doesn't +# support long names like on DOS, Mac, or CD-ROM. +# The default value is: NO. + +SHORT_NAMES = NO + +# If the JAVADOC_AUTOBRIEF tag is set to YES then doxygen will interpret the +# first line (until the first dot) of a Javadoc-style comment as the brief +# description. If set to NO, the Javadoc-style will behave just like regular Qt- +# style comments (thus requiring an explicit @brief command for a brief +# description.) +# The default value is: NO. + +JAVADOC_AUTOBRIEF = YES + +# If the QT_AUTOBRIEF tag is set to YES then doxygen will interpret the first +# line (until the first dot) of a Qt-style comment as the brief description. If +# set to NO, the Qt-style will behave just like regular Qt-style comments (thus +# requiring an explicit \brief command for a brief description.) +# The default value is: NO. + +QT_AUTOBRIEF = NO + +# The MULTILINE_CPP_IS_BRIEF tag can be set to YES to make doxygen treat a +# multi-line C++ special comment block (i.e. a block of //! or /// comments) as +# a brief description. This used to be the default behavior. The new default is +# to treat a multi-line C++ comment block as a detailed description. Set this +# tag to YES if you prefer the old behavior instead. +# +# Note that setting this tag to YES also means that rational rose comments are +# not recognized any more. +# The default value is: NO. + +MULTILINE_CPP_IS_BRIEF = NO + +# If the INHERIT_DOCS tag is set to YES then an undocumented member inherits the +# documentation from any documented member that it re-implements. +# The default value is: YES. + +INHERIT_DOCS = YES + +# If the SEPARATE_MEMBER_PAGES tag is set to YES then doxygen will produce a new +# page for each member. If set to NO, the documentation of a member will be part +# of the file/class/namespace that contains it. +# The default value is: NO. + +SEPARATE_MEMBER_PAGES = NO + +# The TAB_SIZE tag can be used to set the number of spaces in a tab. Doxygen +# uses this value to replace tabs by spaces in code fragments. +# Minimum value: 1, maximum value: 16, default value: 4. + +TAB_SIZE = 4 + +# This tag can be used to specify a number of aliases that act as commands in +# the documentation. An alias has the form: +# name=value +# For example adding +# "sideeffect=@par Side Effects:\n" +# will allow you to put the command \sideeffect (or @sideeffect) in the +# documentation, which will result in a user-defined paragraph with heading +# "Side Effects:". You can put \n's in the value part of an alias to insert +# newlines. + +ALIASES = + +# This tag can be used to specify a number of word-keyword mappings (TCL only). +# A mapping has the form "name=value". For example adding "class=itcl::class" +# will allow you to use the command class in the itcl::class meaning. + +TCL_SUBST = + +# Set the OPTIMIZE_OUTPUT_FOR_C tag to YES if your project consists of C sources +# only. Doxygen will then generate output that is more tailored for C. For +# instance, some of the names that are used will be different. The list of all +# members will be omitted, etc. +# The default value is: NO. + +OPTIMIZE_OUTPUT_FOR_C = NO + +# Set the OPTIMIZE_OUTPUT_JAVA tag to YES if your project consists of Java or +# Python sources only. Doxygen will then generate output that is more tailored +# for that language. For instance, namespaces will be presented as packages, +# qualified scopes will look different, etc. +# The default value is: NO. + +OPTIMIZE_OUTPUT_JAVA = YES + +# Set the OPTIMIZE_FOR_FORTRAN tag to YES if your project consists of Fortran +# sources. Doxygen will then generate output that is tailored for Fortran. +# The default value is: NO. + +OPTIMIZE_FOR_FORTRAN = NO + +# Set the OPTIMIZE_OUTPUT_VHDL tag to YES if your project consists of VHDL +# sources. Doxygen will then generate output that is tailored for VHDL. +# The default value is: NO. + +OPTIMIZE_OUTPUT_VHDL = NO + +# Doxygen selects the parser to use depending on the extension of the files it +# parses. With this tag you can assign which parser to use for a given +# extension. Doxygen has a built-in mapping, but you can override or extend it +# using this tag. The format is ext=language, where ext is a file extension, and +# language is one of the parsers supported by doxygen: IDL, Java, Javascript, +# C#, C, C++, D, PHP, Objective-C, Python, Fortran (fixed format Fortran: +# FortranFixed, free formatted Fortran: FortranFree, unknown formatted Fortran: +# Fortran. In the later case the parser tries to guess whether the code is fixed +# or free formatted code, this is the default for Fortran type files), VHDL. For +# instance to make doxygen treat .inc files as Fortran files (default is PHP), +# and .f files as C (default is Fortran), use: inc=Fortran f=C. +# +# Note: For files without extension you can use no_extension as a placeholder. +# +# Note that for custom extensions you also need to set FILE_PATTERNS otherwise +# the files are not read by doxygen. + +EXTENSION_MAPPING = + +# If the MARKDOWN_SUPPORT tag is enabled then doxygen pre-processes all comments +# according to the Markdown format, which allows for more readable +# documentation. See http://daringfireball.net/projects/markdown/ for details. +# The output of markdown processing is further processed by doxygen, so you can +# mix doxygen, HTML, and XML commands with Markdown formatting. Disable only in +# case of backward compatibilities issues. +# The default value is: YES. + +MARKDOWN_SUPPORT = YES + +# When the TOC_INCLUDE_HEADINGS tag is set to a non-zero value, all headings up +# to that level are automatically included in the table of contents, even if +# they do not have an id attribute. +# Note: This feature currently applies only to Markdown headings. +# Minimum value: 0, maximum value: 99, default value: 0. +# This tag requires that the tag MARKDOWN_SUPPORT is set to YES. + +TOC_INCLUDE_HEADINGS = 0 + +# When enabled doxygen tries to link words that correspond to documented +# classes, or namespaces to their corresponding documentation. Such a link can +# be prevented in individual cases by putting a % sign in front of the word or +# globally by setting AUTOLINK_SUPPORT to NO. +# The default value is: YES. + +AUTOLINK_SUPPORT = YES + +# If you use STL classes (i.e. std::string, std::vector, etc.) but do not want +# to include (a tag file for) the STL sources as input, then you should set this +# tag to YES in order to let doxygen match functions declarations and +# definitions whose arguments contain STL classes (e.g. func(std::string); +# versus func(std::string) {}). This also make the inheritance and collaboration +# diagrams that involve STL classes more complete and accurate. +# The default value is: NO. + +BUILTIN_STL_SUPPORT = NO + +# If you use Microsoft's C++/CLI language, you should set this option to YES to +# enable parsing support. +# The default value is: NO. + +CPP_CLI_SUPPORT = NO + +# Set the SIP_SUPPORT tag to YES if your project consists of sip (see: +# http://www.riverbankcomputing.co.uk/software/sip/intro) sources only. Doxygen +# will parse them like normal C++ but will assume all classes use public instead +# of private inheritance when no explicit protection keyword is present. +# The default value is: NO. + +SIP_SUPPORT = NO + +# For Microsoft's IDL there are propget and propput attributes to indicate +# getter and setter methods for a property. Setting this option to YES will make +# doxygen to replace the get and set methods by a property in the documentation. +# This will only work if the methods are indeed getting or setting a simple +# type. If this is not the case, or you want to show the methods anyway, you +# should set this option to NO. +# The default value is: YES. + +IDL_PROPERTY_SUPPORT = YES + +# If member grouping is used in the documentation and the DISTRIBUTE_GROUP_DOC +# tag is set to YES then doxygen will reuse the documentation of the first +# member in the group (if any) for the other members of the group. By default +# all members of a group must be documented explicitly. +# The default value is: NO. + +DISTRIBUTE_GROUP_DOC = NO + +# If one adds a struct or class to a group and this option is enabled, then also +# any nested class or struct is added to the same group. By default this option +# is disabled and one has to add nested compounds explicitly via \ingroup. +# The default value is: NO. + +GROUP_NESTED_COMPOUNDS = NO + +# Set the SUBGROUPING tag to YES to allow class member groups of the same type +# (for instance a group of public functions) to be put as a subgroup of that +# type (e.g. under the Public Functions section). Set it to NO to prevent +# subgrouping. Alternatively, this can be done per class using the +# \nosubgrouping command. +# The default value is: YES. + +SUBGROUPING = YES + +# When the INLINE_GROUPED_CLASSES tag is set to YES, classes, structs and unions +# are shown inside the group in which they are included (e.g. using \ingroup) +# instead of on a separate page (for HTML and Man pages) or section (for LaTeX +# and RTF). +# +# Note that this feature does not work in combination with +# SEPARATE_MEMBER_PAGES. +# The default value is: NO. + +INLINE_GROUPED_CLASSES = NO + +# When the INLINE_SIMPLE_STRUCTS tag is set to YES, structs, classes, and unions +# with only public data fields or simple typedef fields will be shown inline in +# the documentation of the scope in which they are defined (i.e. file, +# namespace, or group documentation), provided this scope is documented. If set +# to NO, structs, classes, and unions are shown on a separate page (for HTML and +# Man pages) or section (for LaTeX and RTF). +# The default value is: NO. + +INLINE_SIMPLE_STRUCTS = YES + +# When TYPEDEF_HIDES_STRUCT tag is enabled, a typedef of a struct, union, or +# enum is documented as struct, union, or enum with the name of the typedef. So +# typedef struct TypeS {} TypeT, will appear in the documentation as a struct +# with name TypeT. When disabled the typedef will appear as a member of a file, +# namespace, or class. And the struct will be named TypeS. This can typically be +# useful for C code in case the coding convention dictates that all compound +# types are typedef'ed and only the typedef is referenced, never the tag name. +# The default value is: NO. + +TYPEDEF_HIDES_STRUCT = NO + +# The size of the symbol lookup cache can be set using LOOKUP_CACHE_SIZE. This +# cache is used to resolve symbols given their name and scope. Since this can be +# an expensive process and often the same symbol appears multiple times in the +# code, doxygen keeps a cache of pre-resolved symbols. If the cache is too small +# doxygen will become slower. If the cache is too large, memory is wasted. The +# cache size is given by this formula: 2^(16+LOOKUP_CACHE_SIZE). The valid range +# is 0..9, the default is 0, corresponding to a cache size of 2^16=65536 +# symbols. At the end of a run doxygen will report the cache usage and suggest +# the optimal cache size from a speed point of view. +# Minimum value: 0, maximum value: 9, default value: 0. + +LOOKUP_CACHE_SIZE = 0 + +#--------------------------------------------------------------------------- +# Build related configuration options +#--------------------------------------------------------------------------- + +# If the EXTRACT_ALL tag is set to YES, doxygen will assume all entities in +# documentation are documented, even if no documentation was available. Private +# class members and static file members will be hidden unless the +# EXTRACT_PRIVATE respectively EXTRACT_STATIC tags are set to YES. +# Note: This will also disable the warnings about undocumented members that are +# normally produced when WARNINGS is set to YES. +# The default value is: NO. + +EXTRACT_ALL = YES + +# If the EXTRACT_PRIVATE tag is set to YES, all private members of a class will +# be included in the documentation. +# The default value is: NO. + +EXTRACT_PRIVATE = NO + +# If the EXTRACT_PACKAGE tag is set to YES, all members with package or internal +# scope will be included in the documentation. +# The default value is: NO. + +EXTRACT_PACKAGE = NO + +# If the EXTRACT_STATIC tag is set to YES, all static members of a file will be +# included in the documentation. +# The default value is: NO. + +EXTRACT_STATIC = YES + +# If the EXTRACT_LOCAL_CLASSES tag is set to YES, classes (and structs) defined +# locally in source files will be included in the documentation. If set to NO, +# only classes defined in header files are included. Does not have any effect +# for Java sources. +# The default value is: YES. + +EXTRACT_LOCAL_CLASSES = YES + +# This flag is only useful for Objective-C code. If set to YES, local methods, +# which are defined in the implementation section but not in the interface are +# included in the documentation. If set to NO, only methods in the interface are +# included. +# The default value is: NO. + +EXTRACT_LOCAL_METHODS = NO + +# If this flag is set to YES, the members of anonymous namespaces will be +# extracted and appear in the documentation as a namespace called +# 'anonymous_namespace{file}', where file will be replaced with the base name of +# the file that contains the anonymous namespace. By default anonymous namespace +# are hidden. +# The default value is: NO. + +EXTRACT_ANON_NSPACES = NO + +# If the HIDE_UNDOC_MEMBERS tag is set to YES, doxygen will hide all +# undocumented members inside documented classes or files. If set to NO these +# members will be included in the various overviews, but no documentation +# section is generated. This option has no effect if EXTRACT_ALL is enabled. +# The default value is: NO. + +HIDE_UNDOC_MEMBERS = NO + +# If the HIDE_UNDOC_CLASSES tag is set to YES, doxygen will hide all +# undocumented classes that are normally visible in the class hierarchy. If set +# to NO, these classes will be included in the various overviews. This option +# has no effect if EXTRACT_ALL is enabled. +# The default value is: NO. + +HIDE_UNDOC_CLASSES = NO + +# If the HIDE_FRIEND_COMPOUNDS tag is set to YES, doxygen will hide all friend +# (class|struct|union) declarations. If set to NO, these declarations will be +# included in the documentation. +# The default value is: NO. + +HIDE_FRIEND_COMPOUNDS = NO + +# If the HIDE_IN_BODY_DOCS tag is set to YES, doxygen will hide any +# documentation blocks found inside the body of a function. If set to NO, these +# blocks will be appended to the function's detailed documentation block. +# The default value is: NO. + +HIDE_IN_BODY_DOCS = NO + +# The INTERNAL_DOCS tag determines if documentation that is typed after a +# \internal command is included. If the tag is set to NO then the documentation +# will be excluded. Set it to YES to include the internal documentation. +# The default value is: NO. + +INTERNAL_DOCS = NO + +# If the CASE_SENSE_NAMES tag is set to NO then doxygen will only generate file +# names in lower-case letters. If set to YES, upper-case letters are also +# allowed. This is useful if you have classes or files whose names only differ +# in case and if your file system supports case sensitive file names. Windows +# and Mac users are advised to set this option to NO. +# The default value is: system dependent. + +CASE_SENSE_NAMES = YES + +# If the HIDE_SCOPE_NAMES tag is set to NO then doxygen will show members with +# their full class and namespace scopes in the documentation. If set to YES, the +# scope will be hidden. +# The default value is: NO. + +HIDE_SCOPE_NAMES = YES + +# If the HIDE_COMPOUND_REFERENCE tag is set to NO (default) then doxygen will +# append additional text to a page's title, such as Class Reference. If set to +# YES the compound reference will be hidden. +# The default value is: NO. + +HIDE_COMPOUND_REFERENCE= NO + +# If the SHOW_INCLUDE_FILES tag is set to YES then doxygen will put a list of +# the files that are included by a file in the documentation of that file. +# The default value is: YES. + +SHOW_INCLUDE_FILES = YES + +# If the SHOW_GROUPED_MEMB_INC tag is set to YES then Doxygen will add for each +# grouped member an include statement to the documentation, telling the reader +# which file to include in order to use the member. +# The default value is: NO. + +SHOW_GROUPED_MEMB_INC = NO + +# If the FORCE_LOCAL_INCLUDES tag is set to YES then doxygen will list include +# files with double quotes in the documentation rather than with sharp brackets. +# The default value is: NO. + +FORCE_LOCAL_INCLUDES = NO + +# If the INLINE_INFO tag is set to YES then a tag [inline] is inserted in the +# documentation for inline members. +# The default value is: YES. + +INLINE_INFO = YES + +# If the SORT_MEMBER_DOCS tag is set to YES then doxygen will sort the +# (detailed) documentation of file and class members alphabetically by member +# name. If set to NO, the members will appear in declaration order. +# The default value is: YES. + +SORT_MEMBER_DOCS = YES + +# If the SORT_BRIEF_DOCS tag is set to YES then doxygen will sort the brief +# descriptions of file, namespace and class members alphabetically by member +# name. If set to NO, the members will appear in declaration order. Note that +# this will also influence the order of the classes in the class list. +# The default value is: NO. + +SORT_BRIEF_DOCS = NO + +# If the SORT_MEMBERS_CTORS_1ST tag is set to YES then doxygen will sort the +# (brief and detailed) documentation of class members so that constructors and +# destructors are listed first. If set to NO the constructors will appear in the +# respective orders defined by SORT_BRIEF_DOCS and SORT_MEMBER_DOCS. +# Note: If SORT_BRIEF_DOCS is set to NO this option is ignored for sorting brief +# member documentation. +# Note: If SORT_MEMBER_DOCS is set to NO this option is ignored for sorting +# detailed member documentation. +# The default value is: NO. + +SORT_MEMBERS_CTORS_1ST = NO + +# If the SORT_GROUP_NAMES tag is set to YES then doxygen will sort the hierarchy +# of group names into alphabetical order. If set to NO the group names will +# appear in their defined order. +# The default value is: NO. + +SORT_GROUP_NAMES = NO + +# If the SORT_BY_SCOPE_NAME tag is set to YES, the class list will be sorted by +# fully-qualified names, including namespaces. If set to NO, the class list will +# be sorted only by class name, not including the namespace part. +# Note: This option is not very useful if HIDE_SCOPE_NAMES is set to YES. +# Note: This option applies only to the class list, not to the alphabetical +# list. +# The default value is: NO. + +SORT_BY_SCOPE_NAME = NO + +# If the STRICT_PROTO_MATCHING option is enabled and doxygen fails to do proper +# type resolution of all parameters of a function it will reject a match between +# the prototype and the implementation of a member function even if there is +# only one candidate or it is obvious which candidate to choose by doing a +# simple string match. By disabling STRICT_PROTO_MATCHING doxygen will still +# accept a match between prototype and implementation in such cases. +# The default value is: NO. + +STRICT_PROTO_MATCHING = NO + +# The GENERATE_TODOLIST tag can be used to enable (YES) or disable (NO) the todo +# list. This list is created by putting \todo commands in the documentation. +# The default value is: YES. + +GENERATE_TODOLIST = YES + +# The GENERATE_TESTLIST tag can be used to enable (YES) or disable (NO) the test +# list. This list is created by putting \test commands in the documentation. +# The default value is: YES. + +GENERATE_TESTLIST = YES + +# The GENERATE_BUGLIST tag can be used to enable (YES) or disable (NO) the bug +# list. This list is created by putting \bug commands in the documentation. +# The default value is: YES. + +GENERATE_BUGLIST = YES + +# The GENERATE_DEPRECATEDLIST tag can be used to enable (YES) or disable (NO) +# the deprecated list. This list is created by putting \deprecated commands in +# the documentation. +# The default value is: YES. + +GENERATE_DEPRECATEDLIST= YES + +# The ENABLED_SECTIONS tag can be used to enable conditional documentation +# sections, marked by \if ... \endif and \cond +# ... \endcond blocks. + +ENABLED_SECTIONS = + +# The MAX_INITIALIZER_LINES tag determines the maximum number of lines that the +# initial value of a variable or macro / define can have for it to appear in the +# documentation. If the initializer consists of more lines than specified here +# it will be hidden. Use a value of 0 to hide initializers completely. The +# appearance of the value of individual variables and macros / defines can be +# controlled using \showinitializer or \hideinitializer command in the +# documentation regardless of this setting. +# Minimum value: 0, maximum value: 10000, default value: 30. + +MAX_INITIALIZER_LINES = 30 + +# Set the SHOW_USED_FILES tag to NO to disable the list of files generated at +# the bottom of the documentation of classes and structs. If set to YES, the +# list will mention the files that were used to generate the documentation. +# The default value is: YES. + +SHOW_USED_FILES = YES + +# Set the SHOW_FILES tag to NO to disable the generation of the Files page. This +# will remove the Files entry from the Quick Index and from the Folder Tree View +# (if specified). +# The default value is: YES. + +SHOW_FILES = YES + +# Set the SHOW_NAMESPACES tag to NO to disable the generation of the Namespaces +# page. This will remove the Namespaces entry from the Quick Index and from the +# Folder Tree View (if specified). +# The default value is: YES. + +SHOW_NAMESPACES = YES + +# The FILE_VERSION_FILTER tag can be used to specify a program or script that +# doxygen should invoke to get the current version for each file (typically from +# the version control system). Doxygen will invoke the program by executing (via +# popen()) the command command input-file, where command is the value of the +# FILE_VERSION_FILTER tag, and input-file is the name of an input file provided +# by doxygen. Whatever the program writes to standard output is used as the file +# version. For an example see the documentation. + +FILE_VERSION_FILTER = + +# The LAYOUT_FILE tag can be used to specify a layout file which will be parsed +# by doxygen. The layout file controls the global structure of the generated +# output files in an output format independent way. To create the layout file +# that represents doxygen's defaults, run doxygen with the -l option. You can +# optionally specify a file name after the option, if omitted DoxygenLayout.xml +# will be used as the name of the layout file. +# +# Note that if you run doxygen from a directory containing a file called +# DoxygenLayout.xml, doxygen will parse it automatically even if the LAYOUT_FILE +# tag is left empty. + +LAYOUT_FILE = doxygen/doxlayout.xml + +# The CITE_BIB_FILES tag can be used to specify one or more bib files containing +# the reference definitions. This must be a list of .bib files. The .bib +# extension is automatically appended if omitted. This requires the bibtex tool +# to be installed. See also http://en.wikipedia.org/wiki/BibTeX for more info. +# For LaTeX the style of the bibliography can be controlled using +# LATEX_BIB_STYLE. To use this feature you need bibtex and perl available in the +# search path. See also \cite for info how to create references. + +CITE_BIB_FILES = + +#--------------------------------------------------------------------------- +# Configuration options related to warning and progress messages +#--------------------------------------------------------------------------- + +# The QUIET tag can be used to turn on/off the messages that are generated to +# standard output by doxygen. If QUIET is set to YES this implies that the +# messages are off. +# The default value is: NO. + +QUIET = NO + +# The WARNINGS tag can be used to turn on/off the warning messages that are +# generated to standard error (stderr) by doxygen. If WARNINGS is set to YES +# this implies that the warnings are on. +# +# Tip: Turn warnings on while writing the documentation. +# The default value is: YES. + +WARNINGS = YES + +# If the WARN_IF_UNDOCUMENTED tag is set to YES then doxygen will generate +# warnings for undocumented members. If EXTRACT_ALL is set to YES then this flag +# will automatically be disabled. +# The default value is: YES. + +WARN_IF_UNDOCUMENTED = YES + +# If the WARN_IF_DOC_ERROR tag is set to YES, doxygen will generate warnings for +# potential errors in the documentation, such as not documenting some parameters +# in a documented function, or documenting parameters that don't exist or using +# markup commands wrongly. +# The default value is: YES. + +WARN_IF_DOC_ERROR = YES + +# This WARN_NO_PARAMDOC option can be enabled to get warnings for functions that +# are documented, but have no documentation for their parameters or return +# value. If set to NO, doxygen will only warn about wrong or incomplete +# parameter documentation, but not about the absence of documentation. +# The default value is: NO. + +WARN_NO_PARAMDOC = NO + +# If the WARN_AS_ERROR tag is set to YES then doxygen will immediately stop when +# a warning is encountered. +# The default value is: NO. + +WARN_AS_ERROR = NO + +# The WARN_FORMAT tag determines the format of the warning messages that doxygen +# can produce. The string should contain the $file, $line, and $text tags, which +# will be replaced by the file and line number from which the warning originated +# and the warning text. Optionally the format may contain $version, which will +# be replaced by the version of the file (if it could be obtained via +# FILE_VERSION_FILTER) +# The default value is: $file:$line: $text. + +WARN_FORMAT = "$file:$line: $text" + +# The WARN_LOGFILE tag can be used to specify a file to which warning and error +# messages should be written. If left blank the output is written to standard +# error (stderr). + +WARN_LOGFILE = + +#--------------------------------------------------------------------------- +# Configuration options related to the input files +#--------------------------------------------------------------------------- + +# The INPUT tag is used to specify the files and/or directories that contain +# documented source files. You may enter file names like myfile.cpp or +# directories like /usr/src/myproject. Separate the files or directories with +# spaces. See also FILE_PATTERNS and EXTENSION_MAPPING +# Note: If this tag is empty the current directory is searched. + +INPUT = ../com.unity.ml-agents/Runtime/ + +# This tag can be used to specify the character encoding of the source files +# that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses +# libiconv (or the iconv built into libc) for the transcoding. See the libiconv +# documentation (see: http://www.gnu.org/software/libiconv) for the list of +# possible encodings. +# The default value is: UTF-8. + +INPUT_ENCODING = UTF-8 + +# If the value of the INPUT tag contains directories, you can use the +# FILE_PATTERNS tag to specify one or more wildcard patterns (like *.cpp and +# *.h) to filter out the source-files in the directories. +# +# Note that for custom extensions or not directly supported extensions you also +# need to set EXTENSION_MAPPING for the extension otherwise the files are not +# read by doxygen. +# +# If left blank the following patterns are tested:*.c, *.cc, *.cxx, *.cpp, +# *.c++, *.java, *.ii, *.ixx, *.ipp, *.i++, *.inl, *.idl, *.ddl, *.odl, *.h, +# *.hh, *.hxx, *.hpp, *.h++, *.cs, *.d, *.php, *.php4, *.php5, *.phtml, *.inc, +# *.m, *.markdown, *.md, *.mm, *.dox, *.py, *.pyw, *.f90, *.f95, *.f03, *.f08, +# *.f, *.for, *.tcl, *.vhd, *.vhdl, *.ucf and *.qsf. + +FILE_PATTERNS = *.cs \ + *.md \ + *.py + +# The RECURSIVE tag can be used to specify whether or not subdirectories should +# be searched for input files as well. +# The default value is: NO. + +RECURSIVE = YES + +# The EXCLUDE tag can be used to specify files and/or directories that should be +# excluded from the INPUT source files. This way you can easily exclude a +# subdirectory from a directory tree whose root is specified with the INPUT tag. +# +# Note that relative paths are relative to the directory from which doxygen is +# run. + +EXCLUDE = + +# The EXCLUDE_SYMLINKS tag can be used to select whether or not files or +# directories that are symbolic links (a Unix file system feature) are excluded +# from the input. +# The default value is: NO. + +EXCLUDE_SYMLINKS = NO + +# If the value of the INPUT tag contains directories, you can use the +# EXCLUDE_PATTERNS tag to specify one or more wildcard patterns to exclude +# certain files from those directories. +# +# Note that the wildcards are matched against the file with absolute path, so to +# exclude all test directories for example use the pattern */test/* + +EXCLUDE_PATTERNS = + +# The EXCLUDE_SYMBOLS tag can be used to specify one or more symbol names +# (namespaces, classes, functions, etc.) that should be excluded from the +# output. The symbol name can be a fully qualified name, a word, or if the +# wildcard * is used, a substring. Examples: ANamespace, AClass, +# AClass::ANamespace, ANamespace::*Test +# +# Note that the wildcards are matched against the file with absolute path, so to +# exclude all test directories use the pattern */test/* + +EXCLUDE_SYMBOLS = + +# The EXAMPLE_PATH tag can be used to specify one or more files or directories +# that contain example code fragments that are included (see the \include +# command). + +EXAMPLE_PATH = + +# If the value of the EXAMPLE_PATH tag contains directories, you can use the +# EXAMPLE_PATTERNS tag to specify one or more wildcard pattern (like *.cpp and +# *.h) to filter out the source-files in the directories. If left blank all +# files are included. + +EXAMPLE_PATTERNS = + +# If the EXAMPLE_RECURSIVE tag is set to YES then subdirectories will be +# searched for input files to be used with the \include or \dontinclude commands +# irrespective of the value of the RECURSIVE tag. +# The default value is: NO. + +EXAMPLE_RECURSIVE = NO + +# The IMAGE_PATH tag can be used to specify one or more files or directories +# that contain images that are to be included in the documentation (see the +# \image command). + +IMAGE_PATH = images + +# The INPUT_FILTER tag can be used to specify a program that doxygen should +# invoke to filter for each input file. Doxygen will invoke the filter program +# by executing (via popen()) the command: +# +# +# +# where is the value of the INPUT_FILTER tag, and is the +# name of an input file. Doxygen will then use the output that the filter +# program writes to standard output. If FILTER_PATTERNS is specified, this tag +# will be ignored. +# +# Note that the filter must not add or remove lines; it is applied before the +# code is scanned, but not when the output code is generated. If lines are added +# or removed, the anchors will not be placed correctly. +# +# Note that for custom extensions or not directly supported extensions you also +# need to set EXTENSION_MAPPING for the extension otherwise the files are not +# properly processed by doxygen. + +INPUT_FILTER = + +# The FILTER_PATTERNS tag can be used to specify filters on a per file pattern +# basis. Doxygen will compare the file name with each pattern and apply the +# filter if there is a match. The filters are a list of the form: pattern=filter +# (like *.cpp=my_cpp_filter). See INPUT_FILTER for further information on how +# filters are used. If the FILTER_PATTERNS tag is empty or if none of the +# patterns match the file name, INPUT_FILTER is applied. +# +# Note that for custom extensions or not directly supported extensions you also +# need to set EXTENSION_MAPPING for the extension otherwise the files are not +# properly processed by doxygen. + +FILTER_PATTERNS = + +# If the FILTER_SOURCE_FILES tag is set to YES, the input filter (if set using +# INPUT_FILTER) will also be used to filter the input files that are used for +# producing the source files to browse (i.e. when SOURCE_BROWSER is set to YES). +# The default value is: NO. + +FILTER_SOURCE_FILES = NO + +# The FILTER_SOURCE_PATTERNS tag can be used to specify source filters per file +# pattern. A pattern will override the setting for FILTER_PATTERN (if any) and +# it is also possible to disable source filtering for a specific pattern using +# *.ext= (so without naming a filter). +# This tag requires that the tag FILTER_SOURCE_FILES is set to YES. + +FILTER_SOURCE_PATTERNS = + +# If the USE_MDFILE_AS_MAINPAGE tag refers to the name of a markdown file that +# is part of the input, its contents will be placed on the main page +# (index.html). This can be useful if you have a project on for instance GitHub +# and want to reuse the introduction page also for the doxygen output. + +USE_MDFILE_AS_MAINPAGE = Unity-Agents-Overview.md + +#--------------------------------------------------------------------------- +# Configuration options related to source browsing +#--------------------------------------------------------------------------- + +# If the SOURCE_BROWSER tag is set to YES then a list of source files will be +# generated. Documented entities will be cross-referenced with these sources. +# +# Note: To get rid of all source code in the generated output, make sure that +# also VERBATIM_HEADERS is set to NO. +# The default value is: NO. + +SOURCE_BROWSER = NO + +# Setting the INLINE_SOURCES tag to YES will include the body of functions, +# classes and enums directly into the documentation. +# The default value is: NO. + +INLINE_SOURCES = NO + +# Setting the STRIP_CODE_COMMENTS tag to YES will instruct doxygen to hide any +# special comment blocks from generated source code fragments. Normal C, C++ and +# Fortran comments will always remain visible. +# The default value is: YES. + +STRIP_CODE_COMMENTS = YES + +# If the REFERENCED_BY_RELATION tag is set to YES then for each documented +# function all documented functions referencing it will be listed. +# The default value is: NO. + +REFERENCED_BY_RELATION = NO + +# If the REFERENCES_RELATION tag is set to YES then for each documented function +# all documented entities called/used by that function will be listed. +# The default value is: NO. + +REFERENCES_RELATION = NO + +# If the REFERENCES_LINK_SOURCE tag is set to YES and SOURCE_BROWSER tag is set +# to YES then the hyperlinks from functions in REFERENCES_RELATION and +# REFERENCED_BY_RELATION lists will link to the source code. Otherwise they will +# link to the documentation. +# The default value is: YES. + +REFERENCES_LINK_SOURCE = YES + +# If SOURCE_TOOLTIPS is enabled (the default) then hovering a hyperlink in the +# source code will show a tooltip with additional information such as prototype, +# brief description and links to the definition and documentation. Since this +# will make the HTML file larger and loading of large files a bit slower, you +# can opt to disable this feature. +# The default value is: YES. +# This tag requires that the tag SOURCE_BROWSER is set to YES. + +SOURCE_TOOLTIPS = YES + +# If the USE_HTAGS tag is set to YES then the references to source code will +# point to the HTML generated by the htags(1) tool instead of doxygen built-in +# source browser. The htags tool is part of GNU's global source tagging system +# (see http://www.gnu.org/software/global/global.html). You will need version +# 4.8.6 or higher. +# +# To use it do the following: +# - Install the latest version of global +# - Enable SOURCE_BROWSER and USE_HTAGS in the config file +# - Make sure the INPUT points to the root of the source tree +# - Run doxygen as normal +# +# Doxygen will invoke htags (and that will in turn invoke gtags), so these +# tools must be available from the command line (i.e. in the search path). +# +# The result: instead of the source browser generated by doxygen, the links to +# source code will now point to the output of htags. +# The default value is: NO. +# This tag requires that the tag SOURCE_BROWSER is set to YES. + +USE_HTAGS = NO + +# If the VERBATIM_HEADERS tag is set the YES then doxygen will generate a +# verbatim copy of the header file for each class for which an include is +# specified. Set to NO to disable this. +# See also: Section \class. +# The default value is: YES. + +VERBATIM_HEADERS = YES + +# If the CLANG_ASSISTED_PARSING tag is set to YES then doxygen will use the +# clang parser (see: http://clang.llvm.org/) for more accurate parsing at the +# cost of reduced performance. This can be particularly helpful with template +# rich C++ code for which doxygen's built-in parser lacks the necessary type +# information. +# Note: The availability of this option depends on whether or not doxygen was +# generated with the -Duse-libclang=ON option for CMake. +# The default value is: NO. + +#CLANG_ASSISTED_PARSING = NO + +# If clang assisted parsing is enabled you can provide the compiler with command +# line options that you would normally use when invoking the compiler. Note that +# the include paths will already be set by doxygen for the files and directories +# specified with INPUT and INCLUDE_PATH. +# This tag requires that the tag CLANG_ASSISTED_PARSING is set to YES. + +#CLANG_OPTIONS = + +#--------------------------------------------------------------------------- +# Configuration options related to the alphabetical class index +#--------------------------------------------------------------------------- + +# If the ALPHABETICAL_INDEX tag is set to YES, an alphabetical index of all +# compounds will be generated. Enable this if the project contains a lot of +# classes, structs, unions or interfaces. +# The default value is: YES. + +ALPHABETICAL_INDEX = YES + +# The COLS_IN_ALPHA_INDEX tag can be used to specify the number of columns in +# which the alphabetical index list will be split. +# Minimum value: 1, maximum value: 20, default value: 5. +# This tag requires that the tag ALPHABETICAL_INDEX is set to YES. + +COLS_IN_ALPHA_INDEX = 2 + +# In case all classes in a project start with a common prefix, all classes will +# be put under the same header in the alphabetical index. The IGNORE_PREFIX tag +# can be used to specify a prefix (or a list of prefixes) that should be ignored +# while generating the index headers. +# This tag requires that the tag ALPHABETICAL_INDEX is set to YES. + +IGNORE_PREFIX = + +#--------------------------------------------------------------------------- +# Configuration options related to the HTML output +#--------------------------------------------------------------------------- + +# If the GENERATE_HTML tag is set to YES, doxygen will generate HTML output +# The default value is: YES. + +GENERATE_HTML = YES + +# The HTML_OUTPUT tag is used to specify where the HTML docs will be put. If a +# relative path is entered the value of OUTPUT_DIRECTORY will be put in front of +# it. +# The default directory is: html. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_OUTPUT = ./html + +# The HTML_FILE_EXTENSION tag can be used to specify the file extension for each +# generated HTML page (for example: .htm, .php, .asp). +# The default value is: .html. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_FILE_EXTENSION = .html + +# The HTML_HEADER tag can be used to specify a user-defined HTML header file for +# each generated HTML page. If the tag is left blank doxygen will generate a +# standard header. +# +# To get valid HTML the header file that includes any scripts and style sheets +# that doxygen needs, which is dependent on the configuration options used (e.g. +# the setting GENERATE_TREEVIEW). It is highly recommended to start with a +# default header using +# doxygen -w html new_header.html new_footer.html new_stylesheet.css +# YourConfigFile +# and then modify the file new_header.html. See also section "Doxygen usage" +# for information on how to generate the default header that doxygen normally +# uses. +# Note: The header is subject to change so you typically have to regenerate the +# default header when upgrading to a newer version of doxygen. For a description +# of the possible markers and block names see the documentation. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_HEADER = doxygen/header.html + +# The HTML_FOOTER tag can be used to specify a user-defined HTML footer for each +# generated HTML page. If the tag is left blank doxygen will generate a standard +# footer. See HTML_HEADER for more information on how to generate a default +# footer and what special commands can be used inside the footer. See also +# section "Doxygen usage" for information on how to generate the default footer +# that doxygen normally uses. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_FOOTER = doxygen/footer.html + +# The HTML_STYLESHEET tag can be used to specify a user-defined cascading style +# sheet that is used by each HTML page. It can be used to fine-tune the look of +# the HTML output. If left blank doxygen will generate a default style sheet. +# See also section "Doxygen usage" for information on how to generate the style +# sheet that doxygen normally uses. +# Note: It is recommended to use HTML_EXTRA_STYLESHEET instead of this tag, as +# it is more robust and this tag (HTML_STYLESHEET) will in the future become +# obsolete. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_STYLESHEET = doxygen/doxygenbase.css + +# The HTML_EXTRA_STYLESHEET tag can be used to specify additional user-defined +# cascading style sheets that are included after the standard style sheets +# created by doxygen. Using this option one can overrule certain style aspects. +# This is preferred over using HTML_STYLESHEET since it does not replace the +# standard style sheet and is therefore more robust against future updates. +# Doxygen will copy the style sheet files to the output directory. +# Note: The order of the extra style sheet files is of importance (e.g. the last +# style sheet in the list overrules the setting of the previous ones in the +# list). For an example see the documentation. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_EXTRA_STYLESHEET = doxygen/unity.css + +# The HTML_EXTRA_FILES tag can be used to specify one or more extra images or +# other source files which should be copied to the HTML output directory. Note +# that these files will be copied to the base HTML output directory. Use the +# $relpath^ marker in the HTML_HEADER and/or HTML_FOOTER files to load these +# files. In the HTML_STYLESHEET file, use the file name only. Also note that the +# files will be copied as-is; there are no commands or markers available. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_EXTRA_FILES = + +# The HTML_COLORSTYLE_HUE tag controls the color of the HTML output. Doxygen +# will adjust the colors in the style sheet and background images according to +# this color. Hue is specified as an angle on a colorwheel, see +# http://en.wikipedia.org/wiki/Hue for more information. For instance the value +# 0 represents red, 60 is yellow, 120 is green, 180 is cyan, 240 is blue, 300 +# purple, and 360 is red again. +# Minimum value: 0, maximum value: 359, default value: 220. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_COLORSTYLE_HUE = 220 + +# The HTML_COLORSTYLE_SAT tag controls the purity (or saturation) of the colors +# in the HTML output. For a value of 0 the output will use grayscales only. A +# value of 255 will produce the most vivid colors. +# Minimum value: 0, maximum value: 255, default value: 100. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_COLORSTYLE_SAT = 100 + +# The HTML_COLORSTYLE_GAMMA tag controls the gamma correction applied to the +# luminance component of the colors in the HTML output. Values below 100 +# gradually make the output lighter, whereas values above 100 make the output +# darker. The value divided by 100 is the actual gamma applied, so 80 represents +# a gamma of 0.8, The value 220 represents a gamma of 2.2, and 100 does not +# change the gamma. +# Minimum value: 40, maximum value: 240, default value: 80. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_COLORSTYLE_GAMMA = 80 + +# If the HTML_TIMESTAMP tag is set to YES then the footer of each generated HTML +# page will contain the date and time when the page was generated. Setting this +# to YES can help to show when doxygen was last run and thus if the +# documentation is up to date. +# The default value is: NO. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_TIMESTAMP = NO + +# If the HTML_DYNAMIC_SECTIONS tag is set to YES then the generated HTML +# documentation will contain sections that can be hidden and shown after the +# page has loaded. +# The default value is: NO. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_DYNAMIC_SECTIONS = NO + +# With HTML_INDEX_NUM_ENTRIES one can control the preferred number of entries +# shown in the various tree structured indices initially; the user can expand +# and collapse entries dynamically later on. Doxygen will expand the tree to +# such a level that at most the specified number of entries are visible (unless +# a fully collapsed tree already exceeds this amount). So setting the number of +# entries 1 will produce a full collapsed tree by default. 0 is a special value +# representing an infinite number of entries and will result in a full expanded +# tree by default. +# Minimum value: 0, maximum value: 9999, default value: 100. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_INDEX_NUM_ENTRIES = 100 + +# If the GENERATE_DOCSET tag is set to YES, additional index files will be +# generated that can be used as input for Apple's Xcode 3 integrated development +# environment (see: http://developer.apple.com/tools/xcode/), introduced with +# OSX 10.5 (Leopard). To create a documentation set, doxygen will generate a +# Makefile in the HTML output directory. Running make will produce the docset in +# that directory and running make install will install the docset in +# ~/Library/Developer/Shared/Documentation/DocSets so that Xcode will find it at +# startup. See http://developer.apple.com/tools/creatingdocsetswithdoxygen.html +# for more information. +# The default value is: NO. +# This tag requires that the tag GENERATE_HTML is set to YES. + +GENERATE_DOCSET = NO + +# This tag determines the name of the docset feed. A documentation feed provides +# an umbrella under which multiple documentation sets from a single provider +# (such as a company or product suite) can be grouped. +# The default value is: Doxygen generated docs. +# This tag requires that the tag GENERATE_DOCSET is set to YES. + +DOCSET_FEEDNAME = "Doxygen generated docs" + +# This tag specifies a string that should uniquely identify the documentation +# set bundle. This should be a reverse domain-name style string, e.g. +# com.mycompany.MyDocSet. Doxygen will append .docset to the name. +# The default value is: org.doxygen.Project. +# This tag requires that the tag GENERATE_DOCSET is set to YES. + +DOCSET_BUNDLE_ID = org.doxygen.Project + +# The DOCSET_PUBLISHER_ID tag specifies a string that should uniquely identify +# the documentation publisher. This should be a reverse domain-name style +# string, e.g. com.mycompany.MyDocSet.documentation. +# The default value is: org.doxygen.Publisher. +# This tag requires that the tag GENERATE_DOCSET is set to YES. + +DOCSET_PUBLISHER_ID = org.doxygen.Publisher + +# The DOCSET_PUBLISHER_NAME tag identifies the documentation publisher. +# The default value is: Publisher. +# This tag requires that the tag GENERATE_DOCSET is set to YES. + +DOCSET_PUBLISHER_NAME = Publisher + +# If the GENERATE_HTMLHELP tag is set to YES then doxygen generates three +# additional HTML index files: index.hhp, index.hhc, and index.hhk. The +# index.hhp is a project file that can be read by Microsoft's HTML Help Workshop +# (see: http://www.microsoft.com/en-us/download/details.aspx?id=21138) on +# Windows. +# +# The HTML Help Workshop contains a compiler that can convert all HTML output +# generated by doxygen into a single compiled HTML file (.chm). Compiled HTML +# files are now used as the Windows 98 help format, and will replace the old +# Windows help format (.hlp) on all Windows platforms in the future. Compressed +# HTML files also contain an index, a table of contents, and you can search for +# words in the documentation. The HTML workshop also contains a viewer for +# compressed HTML files. +# The default value is: NO. +# This tag requires that the tag GENERATE_HTML is set to YES. + +GENERATE_HTMLHELP = NO + +# The CHM_FILE tag can be used to specify the file name of the resulting .chm +# file. You can add a path in front of the file if the result should not be +# written to the html output directory. +# This tag requires that the tag GENERATE_HTMLHELP is set to YES. + +CHM_FILE = + +# The HHC_LOCATION tag can be used to specify the location (absolute path +# including file name) of the HTML help compiler (hhc.exe). If non-empty, +# doxygen will try to run the HTML help compiler on the generated index.hhp. +# The file has to be specified with full path. +# This tag requires that the tag GENERATE_HTMLHELP is set to YES. + +HHC_LOCATION = + +# The GENERATE_CHI flag controls if a separate .chi index file is generated +# (YES) or that it should be included in the master .chm file (NO). +# The default value is: NO. +# This tag requires that the tag GENERATE_HTMLHELP is set to YES. + +GENERATE_CHI = NO + +# The CHM_INDEX_ENCODING is used to encode HtmlHelp index (hhk), content (hhc) +# and project file content. +# This tag requires that the tag GENERATE_HTMLHELP is set to YES. + +CHM_INDEX_ENCODING = + +# The BINARY_TOC flag controls whether a binary table of contents is generated +# (YES) or a normal table of contents (NO) in the .chm file. Furthermore it +# enables the Previous and Next buttons. +# The default value is: NO. +# This tag requires that the tag GENERATE_HTMLHELP is set to YES. + +BINARY_TOC = NO + +# The TOC_EXPAND flag can be set to YES to add extra items for group members to +# the table of contents of the HTML help documentation and to the tree view. +# The default value is: NO. +# This tag requires that the tag GENERATE_HTMLHELP is set to YES. + +TOC_EXPAND = NO + +# If the GENERATE_QHP tag is set to YES and both QHP_NAMESPACE and +# QHP_VIRTUAL_FOLDER are set, an additional index file will be generated that +# can be used as input for Qt's qhelpgenerator to generate a Qt Compressed Help +# (.qch) of the generated HTML documentation. +# The default value is: NO. +# This tag requires that the tag GENERATE_HTML is set to YES. + +GENERATE_QHP = NO + +# If the QHG_LOCATION tag is specified, the QCH_FILE tag can be used to specify +# the file name of the resulting .qch file. The path specified is relative to +# the HTML output folder. +# This tag requires that the tag GENERATE_QHP is set to YES. + +QCH_FILE = + +# The QHP_NAMESPACE tag specifies the namespace to use when generating Qt Help +# Project output. For more information please see Qt Help Project / Namespace +# (see: http://qt-project.org/doc/qt-4.8/qthelpproject.html#namespace). +# The default value is: org.doxygen.Project. +# This tag requires that the tag GENERATE_QHP is set to YES. + +QHP_NAMESPACE = org.doxygen.Project + +# The QHP_VIRTUAL_FOLDER tag specifies the namespace to use when generating Qt +# Help Project output. For more information please see Qt Help Project / Virtual +# Folders (see: http://qt-project.org/doc/qt-4.8/qthelpproject.html#virtual- +# folders). +# The default value is: doc. +# This tag requires that the tag GENERATE_QHP is set to YES. + +QHP_VIRTUAL_FOLDER = doc + +# If the QHP_CUST_FILTER_NAME tag is set, it specifies the name of a custom +# filter to add. For more information please see Qt Help Project / Custom +# Filters (see: http://qt-project.org/doc/qt-4.8/qthelpproject.html#custom- +# filters). +# This tag requires that the tag GENERATE_QHP is set to YES. + +QHP_CUST_FILTER_NAME = + +# The QHP_CUST_FILTER_ATTRS tag specifies the list of the attributes of the +# custom filter to add. For more information please see Qt Help Project / Custom +# Filters (see: http://qt-project.org/doc/qt-4.8/qthelpproject.html#custom- +# filters). +# This tag requires that the tag GENERATE_QHP is set to YES. + +QHP_CUST_FILTER_ATTRS = + +# The QHP_SECT_FILTER_ATTRS tag specifies the list of the attributes this +# project's filter section matches. Qt Help Project / Filter Attributes (see: +# http://qt-project.org/doc/qt-4.8/qthelpproject.html#filter-attributes). +# This tag requires that the tag GENERATE_QHP is set to YES. + +QHP_SECT_FILTER_ATTRS = + +# The QHG_LOCATION tag can be used to specify the location of Qt's +# qhelpgenerator. If non-empty doxygen will try to run qhelpgenerator on the +# generated .qhp file. +# This tag requires that the tag GENERATE_QHP is set to YES. + +QHG_LOCATION = + +# If the GENERATE_ECLIPSEHELP tag is set to YES, additional index files will be +# generated, together with the HTML files, they form an Eclipse help plugin. To +# install this plugin and make it available under the help contents menu in +# Eclipse, the contents of the directory containing the HTML and XML files needs +# to be copied into the plugins directory of eclipse. The name of the directory +# within the plugins directory should be the same as the ECLIPSE_DOC_ID value. +# After copying Eclipse needs to be restarted before the help appears. +# The default value is: NO. +# This tag requires that the tag GENERATE_HTML is set to YES. + +GENERATE_ECLIPSEHELP = NO + +# A unique identifier for the Eclipse help plugin. When installing the plugin +# the directory name containing the HTML and XML files should also have this +# name. Each documentation set should have its own identifier. +# The default value is: org.doxygen.Project. +# This tag requires that the tag GENERATE_ECLIPSEHELP is set to YES. + +ECLIPSE_DOC_ID = org.doxygen.Project + +# If you want full control over the layout of the generated HTML pages it might +# be necessary to disable the index and replace it with your own. The +# DISABLE_INDEX tag can be used to turn on/off the condensed index (tabs) at top +# of each HTML page. A value of NO enables the index and the value YES disables +# it. Since the tabs in the index contain the same information as the navigation +# tree, you can set this option to YES if you also set GENERATE_TREEVIEW to YES. +# The default value is: NO. +# This tag requires that the tag GENERATE_HTML is set to YES. + +DISABLE_INDEX = NO + +# The GENERATE_TREEVIEW tag is used to specify whether a tree-like index +# structure should be generated to display hierarchical information. If the tag +# value is set to YES, a side panel will be generated containing a tree-like +# index structure (just like the one that is generated for HTML Help). For this +# to work a browser that supports JavaScript, DHTML, CSS and frames is required +# (i.e. any modern browser). Windows users are probably better off using the +# HTML help feature. Via custom style sheets (see HTML_EXTRA_STYLESHEET) one can +# further fine-tune the look of the index. As an example, the default style +# sheet generated by doxygen has an example that shows how to put an image at +# the root of the tree instead of the PROJECT_NAME. Since the tree basically has +# the same information as the tab index, you could consider setting +# DISABLE_INDEX to YES when enabling this option. +# The default value is: NO. +# This tag requires that the tag GENERATE_HTML is set to YES. + +GENERATE_TREEVIEW = YES + +# The ENUM_VALUES_PER_LINE tag can be used to set the number of enum values that +# doxygen will group on one line in the generated HTML documentation. +# +# Note that a value of 0 will completely suppress the enum values from appearing +# in the overview section. +# Minimum value: 0, maximum value: 20, default value: 4. +# This tag requires that the tag GENERATE_HTML is set to YES. + +ENUM_VALUES_PER_LINE = 4 + +# If the treeview is enabled (see GENERATE_TREEVIEW) then this tag can be used +# to set the initial width (in pixels) of the frame in which the tree is shown. +# Minimum value: 0, maximum value: 1500, default value: 250. +# This tag requires that the tag GENERATE_HTML is set to YES. + +TREEVIEW_WIDTH = 250 + +# If the EXT_LINKS_IN_WINDOW option is set to YES, doxygen will open links to +# external symbols imported via tag files in a separate window. +# The default value is: NO. +# This tag requires that the tag GENERATE_HTML is set to YES. + +EXT_LINKS_IN_WINDOW = NO + +# Use this tag to change the font size of LaTeX formulas included as images in +# the HTML documentation. When you change the font size after a successful +# doxygen run you need to manually remove any form_*.png images from the HTML +# output directory to force them to be regenerated. +# Minimum value: 8, maximum value: 50, default value: 10. +# This tag requires that the tag GENERATE_HTML is set to YES. + +FORMULA_FONTSIZE = 10 + +# Use the FORMULA_TRANPARENT tag to determine whether or not the images +# generated for formulas are transparent PNGs. Transparent PNGs are not +# supported properly for IE 6.0, but are supported on all modern browsers. +# +# Note that when changing this option you need to delete any form_*.png files in +# the HTML output directory before the changes have effect. +# The default value is: YES. +# This tag requires that the tag GENERATE_HTML is set to YES. + +FORMULA_TRANSPARENT = YES + +# Enable the USE_MATHJAX option to render LaTeX formulas using MathJax (see +# http://www.mathjax.org) which uses client side Javascript for the rendering +# instead of using pre-rendered bitmaps. Use this if you do not have LaTeX +# installed or if you want to formulas look prettier in the HTML output. When +# enabled you may also need to install MathJax separately and configure the path +# to it using the MATHJAX_RELPATH option. +# The default value is: NO. +# This tag requires that the tag GENERATE_HTML is set to YES. + +USE_MATHJAX = NO + +# When MathJax is enabled you can set the default output format to be used for +# the MathJax output. See the MathJax site (see: +# http://docs.mathjax.org/en/latest/output.html) for more details. +# Possible values are: HTML-CSS (which is slower, but has the best +# compatibility), NativeMML (i.e. MathML) and SVG. +# The default value is: HTML-CSS. +# This tag requires that the tag USE_MATHJAX is set to YES. + +MATHJAX_FORMAT = HTML-CSS + +# When MathJax is enabled you need to specify the location relative to the HTML +# output directory using the MATHJAX_RELPATH option. The destination directory +# should contain the MathJax.js script. For instance, if the mathjax directory +# is located at the same level as the HTML output directory, then +# MATHJAX_RELPATH should be ../mathjax. The default value points to the MathJax +# Content Delivery Network so you can quickly see the result without installing +# MathJax. However, it is strongly recommended to install a local copy of +# MathJax from http://www.mathjax.org before deployment. +# The default value is: http://cdn.mathjax.org/mathjax/latest. +# This tag requires that the tag USE_MATHJAX is set to YES. + +MATHJAX_RELPATH = http://cdn.mathjax.org/mathjax/latest + +# The MATHJAX_EXTENSIONS tag can be used to specify one or more MathJax +# extension names that should be enabled during MathJax rendering. For example +# MATHJAX_EXTENSIONS = TeX/AMSmath TeX/AMSsymbols +# This tag requires that the tag USE_MATHJAX is set to YES. + +MATHJAX_EXTENSIONS = + +# The MATHJAX_CODEFILE tag can be used to specify a file with javascript pieces +# of code that will be used on startup of the MathJax code. See the MathJax site +# (see: http://docs.mathjax.org/en/latest/output.html) for more details. For an +# example see the documentation. +# This tag requires that the tag USE_MATHJAX is set to YES. + +MATHJAX_CODEFILE = + +# When the SEARCHENGINE tag is enabled doxygen will generate a search box for +# the HTML output. The underlying search engine uses javascript and DHTML and +# should work on any modern browser. Note that when using HTML help +# (GENERATE_HTMLHELP), Qt help (GENERATE_QHP), or docsets (GENERATE_DOCSET) +# there is already a search function so this one should typically be disabled. +# For large projects the javascript based search engine can be slow, then +# enabling SERVER_BASED_SEARCH may provide a better solution. It is possible to +# search using the keyboard; to jump to the search box use + S +# (what the is depends on the OS and browser, but it is typically +# , /