You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Example colab: [](https://colab.research.google.com/drive/16liQspSIzRazWb_qa_6Z0MRKmMTr2s1s?usp=sharing)
11
+
Example Colab: [](https://colab.research.google.com/drive/16liQspSIzRazWb_qa_6Z0MRKmMTr2s1s?usp=sharing)
12
+
13
+
Example Colab with [SimplerEnv](https://github.com/simpler-env/SimplerEnv): [](https://colab.research.google.com/drive/1Fh6RNJ-eFOzzXBfyVC3wyqJfCI-t09ZJ?usp=sharing)
13
14
14
15
# mbodied agents
15
-
Welcome to **mbodied agents**, a toolkit for integrating state-of-the-art transformers into robotics systems. The goals for this repo are to minimize the ambiguouty, heterogeneity, and data scarcity currently holding generative AI back from wide-spread adoption in robotics. It provides strong type hints for the various types of robot actions and provides a unified interface for:
16
16
17
-
- Streaming to and from vision models such as Yolo and GPT4-o
17
+
Welcome to **mbodied agents**, a toolkit for integrating state-of-the-art transformers into robotics systems. mbodied agents is designed to provide a consistent interface for calling different AI models, handling multimodal data, using/creating datasets trained on different robots, and work for arbitrary observation and action spaces. It can be seamlessly integrated into real hardware or simulation.
18
+
19
+
The goals for this repo are to minimize the ambiguouty, heterogeneity, and data scarcity currently holding generative AI back from wide-spread adoption in robotics. It provides strong type hints for the various types of robot actions and provides a unified interface for:
20
+
21
+
- Streaming to and from vision models e.g. GPT4-o, OpenVLA, etc
18
22
- Handling multimodal data pipelines for setting up continual learning
19
23
- Automatically recording observations and actions to hdf5
20
24
- Exporting to the most popular ML formats such as [Gym Spaces](https://gymnasium.farama.org/index.html) and [Huggingface Datasets](https://huggingface.co/docs/datasets/en/index)
21
-
22
-
And most importantly, the entire library is __100% configurable to any observation and action space__. That's right. With **mbodied agents**, the days of wasting precious engineering time on tedious formatting and post-processing are over. Jump to [Getting Started](#getting-started) to get up and running on [real hardware](https://colab.research.google.com/drive/16liQspSIzRazWb_qa_6Z0MRKmMTr2s1s?usp=sharing) or a [mujoco simulation](https://colab.research.google.com/drive/1sZtVLv17g9Lin1O2DyecBItWXwzUVUeH)
23
25
26
+
And most importantly, the entire library is **100% configurable to any observation and action space**. With **mbodied agents**, the days of wasting precious engineering time on tedious formatting and post-processing are over. Jump to [Getting Started](#getting-started) to get up and running on [real hardware](https://colab.research.google.com/drive/16liQspSIzRazWb_qa_6Z0MRKmMTr2s1s?usp=sharing) or a [mujoco simulation](https://colab.research.google.com/drive/1Fh6RNJ-eFOzzXBfyVC3wyqJfCI-t09ZJ?usp=sharing)
24
27
25
28
## Updates
26
29
@@ -30,10 +33,8 @@ And most importantly, the entire library is __100% configurable to any observati
Each time you interact with your robot, precious, feature-rich data enters your system and needs to be routed to the right place for later retrieval and processing. **mbodied agents** simplify this process with explicit types and easy conversion to various ML-consumable formats. Our hope is to aid in the creation of intelligent, adaptable robots that learn from interactions and perform complex tasks in dynamic environments. Current features include:
57
57
58
58
-**Configurability** : Define your desired Observation and Action spaces and read data into the format that works best for your system.
59
-
-**Natural Language Control** : Use verbal prompts to correct a cognitive agent's actions and calibrate its behavior to a new environment.
59
+
-**Natural Language Control** : Use verbal prompts to correct a language agent's actions and calibrate its behavior to a new environment.
60
60
-**Modularity** : Easily swap out different backends, transformers, and hardware interfaces. For even better results, run multiple agents in separate threads.
61
61
-**Validation** : Ensure that your data is in the correct format and that your actions are within the correct bounds before sending them to the robot.
62
62
@@ -66,19 +66,22 @@ If you would like to integrate a new backend, sense, or motion control, it is ve
66
66
67
67
- OpenAI
68
68
- Anthropic
69
+
- OpenVLA (for motor agent)
69
70
- RT1 (Coming Soon)
70
-
-OpenVLA (Coming Soon)
71
+
-HuggingFace (Coming Soon)
71
72
- More Open Source Models (Coming Soon)
72
73
73
74
### Roadmap
74
75
75
-
-[ ] Asynchronous and Remote Agent Execution
76
+
-[ ] Asynchronous Agent Execution
76
77
-[ ] More Support for In-context Learning from Natural Language
77
78
-[ ] Diffusion-based Data Augmentation
78
79
79
80
## Installation
80
81
81
-
`pip install mbodied-agents`
82
+
```
83
+
pip install mbodied-agents
84
+
```
82
85
83
86
## Dev Environment Setup
84
87
@@ -104,7 +107,7 @@ If you would like to integrate a new backend, sense, or motion control, it is ve
104
107
105
108
### Real Robot Hardware
106
109
107
-
To run the Cognitive Agent on real robot hardware, refer to our in-depth tutorial provided in the Colab link below:
110
+
To run the Language Agent on real robot hardware, refer to our in-depth tutorial provided in the Colab link below:
108
111
109
112
[](https://colab.research.google.com/drive/1DAQkuuEYj8demiuJS1_10FIyTI78Yzh4?usp=sharing)
To run the Cognitive Agent in simulation, i.e. SimplerEnv, click the following Colab to get started:
125
+
To run the Language Agent in simulation, i.e. SimplerEnv, click the following Colab to get started:
123
126
124
-
[](https://colab.research.google.com/drive/1sZtVLv17g9Lin1O2DyecBItWXwzUVUeH)
127
+
[](https://colab.research.google.com/drive/1Fh6RNJ-eFOzzXBfyVC3wyqJfCI-t09ZJ?usp=sharing)
125
128
126
129
To learn more about **SimplerEnv**, please visit [](https://github.com/simpler-env/SimplerEnv.git)
127
130
@@ -146,7 +149,6 @@ The Sample class is a base model for serializing, recording, and manipulating ar
146
149
- A HuggingFace dataset with semantic search capabilities.
147
150
- A Pydantic BaseModel for reliable and quick json serialization/deserialization.
148
151
149
-
150
152
#### Creating a Sample
151
153
152
154
Creating a sample just requires subclassing or passing keyword arguments to the base Sample class:
#### Serialization and Deserialization with Pydantic
175
175
176
176
The Sample class leverages Pydantic's powerful features for serialization and deserialization, allowing you to easily convert between Sample instances and JSON.
The [Backend](src/mbodied_agents/base/backend.py) class is an abstract base class for Backend implementations. It provides the basic structure and methods required for interacting with different backend services, such as API calls for generating completions based on given messages. See [backend directory](src/mbodied_agents/agents/backends) on how various backends are implemented.
250
248
251
-
### Cognitive Agent
249
+
### Language Agent
252
250
253
-
The [Cognitive Agent](src/mbodied_agents/agents/language/cognitive_agent.py) is the main entry point for intelligent robot agents. It can connect to different backends or transformers of your choice. It includes methods for recording conversations, managing context, looking up messages, forgetting messages, storing context, and acting based on an instruction and an image.
251
+
The [Language Agent](src/mbodied_agents/agents/language/language_agent.py) is the main entry point for intelligent robot agents. It can connect to different backends or transformers of your choice. It includes methods for recording conversations, managing context, looking up messages, forgetting messages, storing context, and acting based on an instruction and an image.
254
252
255
253
Currently supported API services are OpenAI and Anthropic. Upcoming API services include Mbodi, Ollama, and HuggingFace. Stay tuned for our Mbodi backend service!
256
254
257
255
For example, to use OpenAI for your robot backend:
[Motor Agent](src/mbodied_agents/agents/motion/motor_agent.py) is similar to Language Agent but instead of returning a string, it always returns a list of `Motion`. Motor Agent is generally powered by robotic transformer models, i.e. OpenVLA, RT1, Octo, etc.
284
+
Some small model, like RT1, can run on edge devices. However, some, like OpenVLA, are too large to run on edge devices. See [OpenVLA Agent](src/mbodied_agents/agents/motion/openvla_agent.py) and an [example OpenVLA server](src/mbodied_agents/agents/motion/openvla_example_server.py)
285
+
283
286
### Controls
284
287
285
288
The [controls](src/mbodied_agents/types/controls.py) module defines various motions to control a robot as Pydantic models. They are also subclassed from `Sample`, thus possessing all the capability of `Sample` as mentioned above. These controls cover a range of actions, from simple joint movements to complex poses and full robot control.
@@ -288,7 +291,6 @@ The [controls](src/mbodied_agents/types/controls.py) module defines various moti
288
291
289
292
Mapping robot actions from a model to an action is very easy. In our example script, we use a mock hardware interface. We also have an [XArm interface](src/mbodied_agents/hardware/xarm_interface.py) as an example.
290
293
291
-
292
294
### Recorder
293
295
294
296
Dataset [Recorder](src/mbodied_agents/data/recording.py) can record your conversation and the robot's actions to a dataset as you interact with/teach the robot. You can define any observation space and action space for the Recorder:
@@ -320,7 +322,7 @@ for observation, action in replayer:
320
322
```
321
323
322
324
## Directory Structure
323
-
```
325
+
324
326
```shell
325
327
├─ assets/ ............. Images, icons, and other static assets
326
328
├─ examples/ ........... Example scripts and usage demonstrations
@@ -338,10 +340,8 @@ for observation, action in replayer:
338
340
└─ tests/ .............. Unit tests
339
341
```
340
342
341
-
342
343
## Contributing
343
344
344
-
See the [contributing guide](CONTRIBUTING.md) for more information.
345
+
See the [contributing guide](CONTRIBUTING.md) for more information.
345
346
346
347
Feel free to report any issues, ask questions, ask for features, or submit PRs.
0 commit comments