RGBLanguageGrounding is a deep learning model for unified 3D reconstruction and queriable segmentation from posed RGB images.
Disclaimer: This project does not contain the most recent work, but is intended to showcase some of the functionality.
Create environment and install dependencies.
conda create -n rgblg python==3.10.14
conda activate rgblg
First, follow Pytorch Setup to install Pytorch with CUDA support.
pip install \
matplotlib \
pillow \
numpy \
scikit-image \
scipy \
timm \
"tqdm>=4.65" \
trimesh \
pytorch_lightning==1.8 \
pyyaml \
opencv-python-headless \
python-box \
tensorboard \
open_clip_torch
For additional usage, follow the parent project instructions FineRecon.
pyramid_test.py tiles different patch resolutions over the input image to assist with finetuning.
python language_grounding/pyramid_test.py \
--src-img ./media/sample_scene.jpg \
--output-dir ./output \
--query "a chair"
