We assume you have access to a GPU that can run CUDA 11.7 and CUDNN 8.5.
First, create an anaconda environment with necessary dependencies by running
conda env create -f conda_env.ymlAfter the installation ends you can activate your environment with
conda activate slacpip install torch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 --index-url https://download.pytorch.org/whl/cu117Install customized stable-baselines3:
cd ..
git clone https://github.com/JiahengHu/stable-baselines3-slac.git
cd stable-baselines3-slac
pip install -e .Install customized iGibson:
cd ..
git clone https://github.com/JiahengHu/iGibson-SLAC.git --recursive
cd iGibson-SLAC
conda install cmake
pip install -e .If you encounter issues installing iGibson, check out the iGibson installation guide.
- Download the required iGibson data.
- Download Modified Tiago URDF and put it into
iGibson-SLAC/igibson/data/assets/models/tiago/
conda install -c conda-forge libstdcxx-ng
export LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATHTo learn latent actions, use the pretrain.py script
python pretrain.py domain=igibson subdomain=wipeThe snapshots will be stored under the following directory:
./models/<obs_type>/<domain>/<agent>/<NAME>Logs are stored on wandb and in the exp_local folder.
On wandb, you will get a training curve similar to:

To launch tensorboard run:
tensorboard --logdir exp_localThe console output is also available in a form:
| train | F: 6000 | S: 3000 | E: 6 | L: 1000 | R: 5.5177 | FPS: 96.7586 | T: 0:00:42
a training entry decodes as
F : total number of environment frames
S : total number of agent steps
E : total number of episodes
R : episode return
FPS: training throughput (frames per second)
T : total training time
You can randomly roll out learned latent actions with:
python test_skills.py snapshot_name=<NAME> domain=igibson subdomain=wipeNote: In the paper, this is done entirely on a real robot with human provided reward. Here we provide a simulated version of the downstream task only for demonstration purposes.
Finally, you can use the saved snapshots of latent action decoder to learn downstream task.
python train.py n_env=1 domain=wipe factored=True low_path=<PATH/TO/LOW_LEVEL_POLICY>On wandb, you will get a training curve similar to:

The high-level weight will be stored under the following directory:
./downstream/<date>/<folder>/high_level_policy.zipYou can visualize the learned policy by running:
python train.py n_env=1 factored=True test=True test_weight=<PATH/TO/HIGH_LEVEL_POLICY> low_path=<PATH/TO/LOW_LEVEL_POLICY> vis=TruePlease refer to tiago_gym.
If you use this code in your research, please cite our paper:
@misc{hu2025slacsimulationpretrainedlatentaction,
title={SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL},
author={Jiaheng Hu and Peter Stone and Roberto Martín-Martín},
year={2025},
eprint={2506.04147},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2506.04147},
}This codebase is partially based on DUSDi.