The Scene Language: Representing Scenes with Programs, Words, and Embeddings

Yunzhi Zhang, Zizhang Li, Matt Zhou, Shangzhe Wu, Jiajun Wu. CVPR 2025.

Installation

Environment

conda create --name sclg python=3.12
conda activate sclg
git clone https://github.com/zzyunzhi/scene-language.git
cd scene-language
pip install -e .

# required for minecraft renderer
pip install spacy
python -m spacy download en_core_web_md

Run python scripts/tests/test_basic.py to check if the installation is successful.

Language Model API

If you don't have API keys, please follow instructions here.

Otherwise, get your Anthropic API key following the official documentation and add it to engine/key.py:

ANTHROPIC_API_KEY = 'YOUR_ANTHROPIC_API_KEY'

We recommond using Claude 3.7 Sonnet which is the default setting. You may switch to other language models here.

Text-Conditioned 3D Generation

Renderer: Mitsuba

python scripts/run.py --tasks "a chessboard with a full set of chess pieces" 
# Experimental
python scripts/run_self_reflect_with_moe.py --tasks "Sponge Bob and friends"

Renderings will be saved to ${PROJ_ROOT}/scripts/outputs/run_${timestep}_${uuid}/${scene_name}_${uuid}/${sample_index}/renderings/*.gif.

Example results with Claude 3.5 Sonnet (please use this download link for raw results including prompts, LLM responses, and renderings):

"a chessboard with a full set of chess pieces"	"A 9x9 Sudoku board partially filled with numbers"	"a scene inspired by Egon Schiele"	"a Roman Colosseum"	"a spider puppet"

Renderer: Minecraft

ENGINE_MODE=minecraft python scripts/run.py --tasks "a detailed cylindrical medieval tower"

Generated scenes are saved as json files in ${PROJ_ROOT}/scripts/outputs/run_${timestep}_${uuid}/${scene_name}_${uuid}/${sample_index}/renderings/*.json. For visualization, run the following command:

python viewers/minecraft/run.py

Then open http://127.0.0.1:5001 in your browser and drag generated json files to the web page.

Example results:

"a witch's house in Halloween"	"a detailed cylindrical medieval tower"	"a detailed model of Picachu"	"Stonehenge"	"a Greek temple"

Renderer: 3D Gaussian Splatting

Coming soon.

Image-Conditioned 3D Generation

python scripts/run.py --tasks ./resources/examples/* --cond image --temperature 0.8

Export Hierarchical Parts as Mesh

# Replace with your actual experiment paths, wildcards supported (e.g., "run_*/*/0" or "**/*")
python scripts/postprocess/export.py --exp-patterns "run_${timestep}_${uuid}/${scene_name}_${uuid}/${sample_index}"

The output will contain visualizations of hierarchial parts of the scene and exported *.ply files. Below shows examples on two scenes, one randomized color denotes one hierarchy level columns. Results in this section are obtained with Claude 3.7 Sonnet. Raw LLM outputs can be found in the same download link as above.

"a large-scale city"	Level: 0	Level: 1	Level: 2

"Basilica de la Sagrada Familia"	Level: 0	Level: 1	Level: 2

The script above constructs entity hierarchy from a program’s call graph—each increase in call depth denotes a deeper hierarchy level (levels 0, 1, 2, etc. as in the table above). If you instead want to manually specify which functions should be treated as leaf nodes, run the following command:

python scripts/postprocess/truncate.py --exp-patterns "run_${timestep}_${uuid}/${scene_name}_${uuid}/${sample_index}" --skip-prompt

Load Mesh in Physics Simulator

You can further load the exported assets from above into a physics simulator. Below is a example script and its output.

# pip install [email protected]:google-research/kubric.git
python scripts/experimental/simulate_pybullet.py

Codebase Details

Macro definitions

The following table lists helper functions defined in this file in accordance with expressions defined in the domain-specific language (DSL) (Tables 2 and 5 of the paper):

Implementation	DSL
`register`	`bind`
`library_call`	`call`
`primitive_call`	`call`
`loop`	`union-loop`
`concat_shapes`	`union`
`transform_shape`	`transform`
`rotation_matrix`	`rotation`
`translation_matrix`	`translate`
`scale_matrix`	`scale`
`reflection_matrix`	`reflect`
`compute_shape_center`	`compute-shape-center`
`compute_shape_min`	`compute-shape-min`
`compute_shape_max`	`compute-shape-max`
`compute_shape_sizes`	`compute-shape-sizes`

Limitations

The pipeline is sensitive to small changes in the prompts as shown here. It is recommended to run prompts with some variations for better results.

Codebase improvements

The current codebase allows you to generate 3D scenes with text or image prompts. Other tasks and renderers reported in the paper will be supported in future updates.

Please open a github issue or email us if encountering any issues.

Bibtex

If you find this work useful, please consider cite the paper:

@inproceedings{zhang2025scenelanguage,
  title={The scene language: Representing scenes with programs, words, and embeddings},
  author={Zhang, Yunzhi and Li, Zizhang and Zhou, Matt and Wu, Shangzhe and Wu, Jiajun},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={24625--24634},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
assets		assets
colab		colab
engine		engine
logs		logs
resources		resources
scripts		scripts
viewers/minecraft		viewers/minecraft
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

The Scene Language: Representing Scenes with Programs, Words, and Embeddings

Installation

Environment

Language Model API

Text-Conditioned 3D Generation

Renderer: Mitsuba

Renderer: Minecraft

Renderer: 3D Gaussian Splatting

Image-Conditioned 3D Generation

Export Hierarchical Parts as Mesh

Load Mesh in Physics Simulator

Codebase Details

Limitations

Codebase improvements

Bibtex

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

zzyunzhi/scene-language

Folders and files

Latest commit

History

Repository files navigation

The Scene Language: Representing Scenes with Programs, Words, and Embeddings

Installation

Environment

Language Model API

Text-Conditioned 3D Generation

Renderer: Mitsuba

Renderer: Minecraft

Renderer: 3D Gaussian Splatting

Image-Conditioned 3D Generation

Export Hierarchical Parts as Mesh

Load Mesh in Physics Simulator

Codebase Details

Limitations

Codebase improvements

Bibtex

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages