This repository contains a wheel‑quadruped locomotion simulator and reinforcement learning framework based on the Genesis simulator. It implements a complete training pipeline for Unitree wheel‑quadruped robots (B2W and GO2W) using a custom reinforcement‑learning toolkit. The goal of this project is to train and evaluate wheel‑legged quadruped locomotion policies in simulation and provide the assets and scripts required to deploy on real robots or export the trained policy to ONNX for inference.
⚠️ Attention: This project is for personal learning purposes only. The code may not be production-ready and should be used with caution.
- Simulation Environment: Custom
WheelLeggedEnvenvironment built on the Genesis physics engine, defining robot dynamics, contact handling, observation and reward functions with RL algorithm interfaces - Robot Models: URDF/Xacro descriptions for Unitree B2W and GO2W wheel-quadruped robots located in
assets/b2w_descriptionandassets/go2w_description - Reinforcement Learning: PPO/on-policy training pipeline implemented with an optimized version of the rsl_rl library, featuring adjustable hyperparameters and support for multiple parallel environments for fast training
- Evaluation & Deployment:
- Policy evaluation scripts (locomotion/wheel_legged_eval.py)
- Environment testing (model_test.py)
- ONNX export capability (onnx/pt2onnx.py) for deployment
- Visualization & Debugging:
- TensorBoard logging in
logs/for monitoring training progress - Joystick/keyboard control support via utils/gamepad.py
- TensorBoard logging in
- Debug Support: Debug logs and tuning notes in
debug_record/documenting common issues and solutions when modifying URDF and environment - Gamepad Teleoperation: Built-in gamepad remote control support
- Model Export: ONNX export support for trained policies, enabling deployment on various platforms
- Comprehensive Documentation: Detailed API documentation, usage guides, and troubleshooting notes
wheel_quadruped_genesis/
├── assets/ # URDF, xacro, and mesh assets for B2W & GO2W
├── locomotion/ # Training and simulation scripts
├── onnx/ # Script to convert JIT models to ONNX
├── rsl_rl/ # Local copy of rsl_rl RL framework
├── debug_record/ # Debug logs and troubleshooting notes
├── logs/ # TensorBoard logs of training runs
For detailed installation instructions, please refer to INSTALL.md. This document includes:
- Environment setup steps
- Dependencies installation guide
- Troubleshooting FAQ
The entrypoint locomotion/wheel_legged_train.py now delegates
to the layered training stack under locomotion/training/. This keeps argument
parsing, configuration management, environment construction, and learner orchestration in separate
modules so that experiments can be scripted or customised more easily.
Run a training session with the default configuration:
python locomotion/wheel_legged_train.py --config locomotion/config/wheel_legged.yaml --no-viewerKey flags:
--exp_name– experiment identifier used to createlogs/<exp_name>/--log_dir– absolute/relative directory override for logs and checkpoints--config-override– apply overrides using dotted keys (e.g.--config-override train.runner.max_iterations=2000)--keep-existing-logs– skip deleting a pre-existing log directory before the run--num_envs,--device,--num_view,--[no-]viewer– control parallelism and visualisation
The stack persists a configuration snapshot (cfgs.pkl) next to checkpoints so evaluation or deployment
can recreate the exact environment. Override values are parsed with YAML semantics, allowing complex
structures (lists, dicts, booleans, numbers) to be injected from the command line.
training/cli.py– argument parser with reusable defaultstraining/configuration.py– loads YAML, applies overrides, and packages sections into aConfigBundletraining/env_factory.py– materialisesWheelLeggedEnvinstances from the bundletraining/workflow.py– orchestrates logging setup, runner creation, training execution, and clean shutdown
TensorBoard can still be used to monitor progress:
tensorboard --logdir logs/Use locomotion/wheel_legged_eval.py to reload the saved
configuration bundle and checkpoints for playback:
python locomotion/wheel_legged_eval.py --exp_name wheel-quadruped-walking --ckpt 8300The script instantiates the environment through the new factory, restores the PPO runner, and exports
a TorchScript policy (policy.pt) for deployment tests. locomotion/model_test.py remains available for
quick smoke tests without policy loading.
You can use a keyboard/gamepad to control the robot in simulation:
python rsl_rl/utils/gamepad_test.pyTo export the trained policy (JIT format) to ONNX:
python onnx/pt2onnx.pyThis produces policy.onnx that can be used in real-time inference engines.
- URDF/Xacro parsing in Genesis may cause joint folding or DOF mismatch; see
调试寄路/记录1.md - Some
.pyc,build/, andCMakeFiles/folders should be cleaned in production
This repository is adapted from:
URDF models based on:
- Unitree B2/GO2 specifications (B2W & GO2W)
MIT License (if not specified otherwise by upstream projects)