Unofficial open-source implementation of AlignSAM (CVPR 2024).
This repository provides a reinforcement learning framework for interactive segmentation using SAM (Segment Anything Model) and CLIP Surgery, inspired by the AlignSAM approach. It includes custom Gymnasium environments, agent architectures, and training scripts for PPO-based policy optimization. For more details, please refer to the Design Doc.
- Custom Gymnasium environment for interactive segmentation with RepViT-SAM integration
- Explicit and implicit agent architectures using CLIP Surgery and SAM embeddings
- PPO training loop with MLflow and TensorBoard logging
- YAML-based configuration for agents and environments
- RepViT-SAM model wrapper
- Model export for inference/deployment
- Support for batch inference for SAM
- Support for sharing dataset instance across environments to reduce memory
- Support for distributed training
- Mixed-precision (AMP) training
- Integration with more datasets (ADE20K, CityScapes etc)
You can download the latest checkpoints from
Clone the repository and submodules:
git clone --recurse-submodules https://github.com/yourusername/AlignSAM-CVPR2024-Unofficial-.git
cd AlignSAM-CVPR2024-Unofficial-- Create and activate a virtual-env or conda-env with Python 3.8.
- Then run the following command from the root folder:
bash install_dependencies.sh
-
Create a data directory and download/symlink the
CoCo Datasetinside it. Follow the reference directory structure below to work with existing config files:data/ ├── images/ ├──── val2017/ ├──────── 00000001.jpg ├──────── 00000002.jpg ├ . ├ . ├ . ├── annotations/ ├──── instances_val2017.json -
Edit configuration files in
configs/to update paths, categories etc. -
Start training:
python train_sam_align_ppo.py --agent_cfg_path configs/agents/explicit_agent.yaml --env_cfg_path configs/envs/repvit_sam_coco.yaml
.
├── train_sam_align_ppo.py # Main training script
├── models/ # Agent architectures
├── custom_gym_implns/ # Custom Gymnasium environments and wrappers
├── configs/ # YAML configuration files
├── datasets/ # Dataset utilities
├── RepViT/ # RepViT submodule (SAM backbone)
├── CLIP_Surgery/ # CLIP Surgery submodule
├── requirements.txt
├── install_dependencies.sh
└── README.md
This project is licensed under the MIT License. See LICENSE for details.



