Skip to content

Commit 1a6af9c

Browse files
committed
Update readme and Colab
1 parent 8218d28 commit 1a6af9c

File tree

6 files changed

+43
-43
lines changed

6 files changed

+43
-43
lines changed

.github/workflows/macos.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ jobs:
3131
- name: Run install script
3232
run: |
3333
chmod +x ./install.bash
34-
./install.bash
34+
./install.bash -y
3535
3636
- name: Cache packages
3737
uses: actions/cache@v3

.github/workflows/ubuntu.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ jobs:
3131
- name: Run install script
3232
run: |
3333
chmod +x ./install.bash
34-
./install.bash
34+
./install.bash -y
3535
3636
- name: Cache packages
3737
uses: actions/cache@v3

README.md

Lines changed: 30 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -6,21 +6,24 @@
66
[![PyPI Version](https://img.shields.io/pypi/v/mbodied-agents.svg)](https://pypi.python.org/pypi/mbodied-agents)
77
[![Documentation Status](https://readthedocs.com/projects/mbodi-ai-mbodied-agents/badge/?version=latest)](https://mbodi-ai-mbodied-agents.readthedocs-hosted.com/en/latest/?badge=latest)
88

9-
109
Documentation: [mbodied agents docs](https://mbodi-ai-mbodied-agents.readthedocs-hosted.com/en)
1110

12-
Example colab: [![Example Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/16liQspSIzRazWb_qa_6Z0MRKmMTr2s1s?usp=sharing)
11+
Example Colab: [![Example Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/16liQspSIzRazWb_qa_6Z0MRKmMTr2s1s?usp=sharing)
12+
13+
Example Colab with [SimplerEnv](https://github.com/simpler-env/SimplerEnv): [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Fh6RNJ-eFOzzXBfyVC3wyqJfCI-t09ZJ?usp=sharing)
1314

1415
# mbodied agents
15-
Welcome to **mbodied agents**, a toolkit for integrating state-of-the-art transformers into robotics systems. The goals for this repo are to minimize the ambiguouty, heterogeneity, and data scarcity currently holding generative AI back from wide-spread adoption in robotics. It provides strong type hints for the various types of robot actions and provides a unified interface for:
1616

17-
- Streaming to and from vision models such as Yolo and GPT4-o
17+
Welcome to **mbodied agents**, a toolkit for integrating state-of-the-art transformers into robotics systems. mbodied agents is designed to provide a consistent interface for calling different AI models, handling multimodal data, using/creating datasets trained on different robots, and work for arbitrary observation and action spaces. It can be seamlessly integrated into real hardware or simulation.
18+
19+
The goals for this repo are to minimize the ambiguouty, heterogeneity, and data scarcity currently holding generative AI back from wide-spread adoption in robotics. It provides strong type hints for the various types of robot actions and provides a unified interface for:
20+
21+
- Streaming to and from vision models e.g. GPT4-o, OpenVLA, etc
1822
- Handling multimodal data pipelines for setting up continual learning
1923
- Automatically recording observations and actions to hdf5
2024
- Exporting to the most popular ML formats such as [Gym Spaces](https://gymnasium.farama.org/index.html) and [Huggingface Datasets](https://huggingface.co/docs/datasets/en/index)
21-
22-
And most importantly, the entire library is __100% configurable to any observation and action space__. That's right. With **mbodied agents**, the days of wasting precious engineering time on tedious formatting and post-processing are over. Jump to [Getting Started](#getting-started) to get up and running on [real hardware](https://colab.research.google.com/drive/16liQspSIzRazWb_qa_6Z0MRKmMTr2s1s?usp=sharing) or a [mujoco simulation](https://colab.research.google.com/drive/1sZtVLv17g9Lin1O2DyecBItWXwzUVUeH)
2325

26+
And most importantly, the entire library is **100% configurable to any observation and action space**. With **mbodied agents**, the days of wasting precious engineering time on tedious formatting and post-processing are over. Jump to [Getting Started](#getting-started) to get up and running on [real hardware](https://colab.research.google.com/drive/16liQspSIzRazWb_qa_6Z0MRKmMTr2s1s?usp=sharing) or a [mujoco simulation](https://colab.research.google.com/drive/1Fh6RNJ-eFOzzXBfyVC3wyqJfCI-t09ZJ?usp=sharing)
2427

2528
## Updates
2629

@@ -30,10 +33,8 @@ And most importantly, the entire library is __100% configurable to any observati
3033

3134
<img src="assets/architecture.jpg" alt="Architecture Diagram" style="width: 650px;">
3235

33-
3436
<img src="assets/demo_gif.gif" alt="Demo GIF" style="width: 625px;">
3537

36-
3738
We welcome any questions, issues, or PRs!
3839

3940
Please join our [Discord](https://discord.gg/RNzf3RCxRJ) for interesting discussions! **⭐ Give us a star on GitHub if you like us!**
@@ -48,15 +49,14 @@ Please join our [Discord](https://discord.gg/RNzf3RCxRJ) for interesting discuss
4849
- [Directory Structure](#directory-structure)
4950
- [Contributing](#contributing)
5051

51-
5252
## Overview
5353

5454
## Why mbodied agents?
5555

5656
Each time you interact with your robot, precious, feature-rich data enters your system and needs to be routed to the right place for later retrieval and processing. **mbodied agents** simplify this process with explicit types and easy conversion to various ML-consumable formats. Our hope is to aid in the creation of intelligent, adaptable robots that learn from interactions and perform complex tasks in dynamic environments. Current features include:
5757

5858
- **Configurability** : Define your desired Observation and Action spaces and read data into the format that works best for your system.
59-
- **Natural Language Control** : Use verbal prompts to correct a cognitive agent's actions and calibrate its behavior to a new environment.
59+
- **Natural Language Control** : Use verbal prompts to correct a language agent's actions and calibrate its behavior to a new environment.
6060
- **Modularity** : Easily swap out different backends, transformers, and hardware interfaces. For even better results, run multiple agents in separate threads.
6161
- **Validation** : Ensure that your data is in the correct format and that your actions are within the correct bounds before sending them to the robot.
6262

@@ -66,19 +66,22 @@ If you would like to integrate a new backend, sense, or motion control, it is ve
6666

6767
- OpenAI
6868
- Anthropic
69+
- OpenVLA (for motor agent)
6970
- RT1 (Coming Soon)
70-
- OpenVLA (Coming Soon)
71+
- HuggingFace (Coming Soon)
7172
- More Open Source Models (Coming Soon)
7273

7374
### Roadmap
7475

75-
- [ ] Asynchronous and Remote Agent Execution
76+
- [ ] Asynchronous Agent Execution
7677
- [ ] More Support for In-context Learning from Natural Language
7778
- [ ] Diffusion-based Data Augmentation
7879

7980
## Installation
8081

81-
`pip install mbodied-agents`
82+
```
83+
pip install mbodied-agents
84+
```
8285

8386
## Dev Environment Setup
8487

@@ -104,7 +107,7 @@ If you would like to integrate a new backend, sense, or motion control, it is ve
104107

105108
### Real Robot Hardware
106109

107-
To run the Cognitive Agent on real robot hardware, refer to our in-depth tutorial provided in the Colab link below:
110+
To run the Language Agent on real robot hardware, refer to our in-depth tutorial provided in the Colab link below:
108111

109112
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1DAQkuuEYj8demiuJS1_10FIyTI78Yzh4?usp=sharing)
110113

@@ -119,9 +122,9 @@ python examples/simple_robot_agent.py --backend=openai
119122

120123
### SimplerEnv Simulation
121124

122-
To run the Cognitive Agent in simulation, i.e. SimplerEnv, click the following Colab to get started:
125+
To run the Language Agent in simulation, i.e. SimplerEnv, click the following Colab to get started:
123126

124-
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1sZtVLv17g9Lin1O2DyecBItWXwzUVUeH)
127+
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Fh6RNJ-eFOzzXBfyVC3wyqJfCI-t09ZJ?usp=sharing)
125128

126129
To learn more about **SimplerEnv**, please visit [![GitHub](https://img.shields.io/badge/GitHub-SimplerEnv-blue?logo=github)](https://github.com/simpler-env/SimplerEnv.git)
127130

@@ -146,7 +149,6 @@ The Sample class is a base model for serializing, recording, and manipulating ar
146149
- A HuggingFace dataset with semantic search capabilities.
147150
- A Pydantic BaseModel for reliable and quick json serialization/deserialization.
148151

149-
150152
#### Creating a Sample
151153

152154
Creating a sample just requires subclassing or passing keyword arguments to the base Sample class:
@@ -169,8 +171,6 @@ unflattened_sample = Sample.unflatten(flat_list, schema)
169171
print(unflattened_sample) # Output: Sample(observation=[1, 2, 3], action=[4, 5, 6])
170172
```
171173

172-
173-
174174
#### Serialization and Deserialization with Pydantic
175175

176176
The Sample class leverages Pydantic's powerful features for serialization and deserialization, allowing you to easily convert between Sample instances and JSON.
@@ -189,7 +189,6 @@ sample = Sample.model_validate(from_json(json_data))
189189
print(sample) # Output: Sample(observation=[1, 2, 3], action=[4, 5, 6])
190190
```
191191

192-
193192
#### Converting to Different Containers
194193

195194
<details> <summary>
@@ -217,9 +216,8 @@ print(sample_hf)
217216
# })
218217

219218
```
220-
</details>
221-
222219

220+
</details>
223221

224222
#### Gym Space Integration
225223

@@ -248,16 +246,16 @@ Message(role="user", content=[Sample("Hello")])
248246

249247
The [Backend](src/mbodied_agents/base/backend.py) class is an abstract base class for Backend implementations. It provides the basic structure and methods required for interacting with different backend services, such as API calls for generating completions based on given messages. See [backend directory](src/mbodied_agents/agents/backends) on how various backends are implemented.
250248

251-
### Cognitive Agent
249+
### Language Agent
252250

253-
The [Cognitive Agent](src/mbodied_agents/agents/language/cognitive_agent.py) is the main entry point for intelligent robot agents. It can connect to different backends or transformers of your choice. It includes methods for recording conversations, managing context, looking up messages, forgetting messages, storing context, and acting based on an instruction and an image.
251+
The [Language Agent](src/mbodied_agents/agents/language/language_agent.py) is the main entry point for intelligent robot agents. It can connect to different backends or transformers of your choice. It includes methods for recording conversations, managing context, looking up messages, forgetting messages, storing context, and acting based on an instruction and an image.
254252

255253
Currently supported API services are OpenAI and Anthropic. Upcoming API services include Mbodi, Ollama, and HuggingFace. Stay tuned for our Mbodi backend service!
256254

257255
For example, to use OpenAI for your robot backend:
258256

259257
```python
260-
robot_agent = CognitiveAgent(context=context_prompt, api_service="openai")
258+
robot_agent = LanguageAgent(context=context_prompt, api_service="openai")
261259
```
262260

263261
`context` can be either a string or a list, for example:
@@ -280,6 +278,11 @@ response = robot_agent.act(instruction, image)[0]
280278
response = robot_agent.act([instruction1, image1, instruction2, image2])[0]
281279
```
282280

281+
### Motor Agent
282+
283+
[Motor Agent](src/mbodied_agents/agents/motion/motor_agent.py) is similar to Language Agent but instead of returning a string, it always returns a list of `Motion`. Motor Agent is generally powered by robotic transformer models, i.e. OpenVLA, RT1, Octo, etc.
284+
Some small model, like RT1, can run on edge devices. However, some, like OpenVLA, are too large to run on edge devices. See [OpenVLA Agent](src/mbodied_agents/agents/motion/openvla_agent.py) and an [example OpenVLA server](src/mbodied_agents/agents/motion/openvla_example_server.py)
285+
283286
### Controls
284287

285288
The [controls](src/mbodied_agents/types/controls.py) module defines various motions to control a robot as Pydantic models. They are also subclassed from `Sample`, thus possessing all the capability of `Sample` as mentioned above. These controls cover a range of actions, from simple joint movements to complex poses and full robot control.
@@ -288,7 +291,6 @@ The [controls](src/mbodied_agents/types/controls.py) module defines various moti
288291

289292
Mapping robot actions from a model to an action is very easy. In our example script, we use a mock hardware interface. We also have an [XArm interface](src/mbodied_agents/hardware/xarm_interface.py) as an example.
290293

291-
292294
### Recorder
293295

294296
Dataset [Recorder](src/mbodied_agents/data/recording.py) can record your conversation and the robot's actions to a dataset as you interact with/teach the robot. You can define any observation space and action space for the Recorder:
@@ -320,7 +322,7 @@ for observation, action in replayer:
320322
```
321323

322324
## Directory Structure
323-
```
325+
324326
```shell
325327
├─ assets/ ............. Images, icons, and other static assets
326328
├─ examples/ ........... Example scripts and usage demonstrations
@@ -338,10 +340,8 @@ for observation, action in replayer:
338340
└─ tests/ .............. Unit tests
339341
```
340342

341-
342343
## Contributing
343344

344-
See the [contributing guide](CONTRIBUTING.md) for more information.
345+
See the [contributing guide](CONTRIBUTING.md) for more information.
345346

346347
Feel free to report any issues, ask questions, ask for features, or submit PRs.
347-

pyproject.toml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -37,15 +37,15 @@ dependencies = [
3737
"h5py",
3838
"click",
3939
"datasets",
40-
"playsound",
41-
"pyaudio",
42-
"xarm-python-sdk",
4340
"jsonref",
4441
"art",
4542
"transformers",
4643
"gradio",
4744
"gradio_client",
4845
"open3d",
46+
"playsound",
47+
"pyaudio",
48+
"xarm-python-sdk",
4949
]
5050

5151
[project.urls]

src/mbodied_agents/agents/sense/audio/audio_handler.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
# Copyright 2024 Mbodi AI
2-
#
2+
#
33
# Licensed under the Apache License, Version 2.0 (the "License");
44
# you may not use this file except in compliance with the License.
55
# You may obtain a copy of the License at
6-
#
6+
#
77
# https://www.apache.org/licenses/LICENSE-2.0
8-
#
8+
#
99
# Unless required by applicable law or agreed to in writing, software
1010
# distributed under the License is distributed on an "AS IS" BASIS,
1111
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -23,7 +23,7 @@
2323
import pyaudio
2424
except ImportError:
2525
logging.warning(
26-
"playsound or pyaudio is not installed. Please install them to enable audio functionality."
26+
"playsound or pyaudio is not installed. Please run `pip install pyaudio playsound` to install."
2727
)
2828

2929
from openai import OpenAI

src/mbodied_agents/hardware/xarm_interface.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,19 @@
11
# Copyright 2024 Mbodi AI
2-
#
2+
#
33
# Licensed under the Apache License, Version 2.0 (the "License");
44
# you may not use this file except in compliance with the License.
55
# You may obtain a copy of the License at
6-
#
6+
#
77
# https://www.apache.org/licenses/LICENSE-2.0
8-
#
8+
#
99
# Unless required by applicable law or agreed to in writing, software
1010
# distributed under the License is distributed on an "AS IS" BASIS,
1111
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
1414

1515
import math
16-
16+
import logging
1717
from xarm.wrapper import XArmAPI
1818

1919
from mbodied_agents.hardware.interface import HardwareInterface

0 commit comments

Comments
 (0)