Skip to content

tsinghua-fib-lab/World-Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 

Repository files navigation

Awesome-World-Model Awesome

A curated list of awesome resources on World Models, based on the comprehensive survey "Understanding World or Predicting Future? A Comprehensive Survey of World Models".

Loading roadmap

News🔥

  • [2024/11/21] Initial release of our survey is available on arXiv.
  • [2025/06/13] Our survey paper "Understanding World or Predicting Future? A Comprehensive Survey of World Models" has been accepted by ACM Computing Surveys.
  • [2025/06/25] Second version of our survey is available on arXiv.
  • [2025/07/18] Initial release of the Awesome-World-Model GitHub repository.
  • [2025/11/18] Third version of our survey is available on arXiv.

Contact

If you have any suggestions or find our work helpful, feel free to contact us
Email: [email protected]

If this list helps your research, please ⭐ and cite:

@article{ding2025understanding,
  title={Understanding World or Predicting Future? A Comprehensive Survey of World Models},
  author={Ding, Jingtao and Zhang, Yunke and Shang, Yu and Zhang, Yuheng and Zong, Zefang and Feng, Jie and Yuan, Yuan and Su, Hongyuan and Li, Nian and Sukiennik, Nicholas and others},
  journal={ACM Computing Surveys},
  volume={58},
  number={3},
  pages={1--38},
  year={2025},
  publisher={ACM New York, NY}
}

Table of Contents 🍃

Roadmap of world models in deep learning era

Loading roadmap

Model-based RL

Title Pub. & Date Code/Project URL
Recurrent world models facilitate policy evolution (RWM) NeurIPS 2018 Website
Learning Latent Dynamics for Planning from Pixels (PlaNet) ICML 2019 Star
Dream to control: Learning behaviors by latent imagination (Dreamer V1) ICLR 2020 Star
Mastering atari with discrete world models (Dreamer V2) ICLR 2021 Star
Temporal Difference Learning for Model Predictive Control (TD-MPC1) ICML 2023 Star
Mastering Diverse Domains through World Models (Dreamer V3) 2023 Star
TD-MPC2: Scalable, Robust World Models for Continuous Control (TD-MPC2) ICLR 2024 Star
PWM: Policy Learning with Multi-Task World Models (PWM) ICLR 2025 Star

Self-supervised learning

Title Pub.&Date Code/Project URL
A path towards autonomous machine intelligence version 0.9. 2, 2022-06-27 (JEPA) 2024
DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning (DINO-WM) 2024 Star
Revisiting Feature Prediction for Learning Visual Representations from Video (V-JEPA) 2024 Star
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning (V-JEPA2) 2025 Star

LLM/MLLM

Title Pub.&Date Code/Project URL
Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning (LLM-DM) NeurIPS 2023 Star
WorldGPT: Empowering LLM as Multimodal World Model (WorldGPT) ACM MM 2024 Star
Text2World: Benchmarking Large Language Models for Symbolic World Model Generation (Text2World) ACL 2025 Star

Video generation

Title Pub.&Date Code/Project URL
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers (CogVideo) ICLR 2023 Star
Structure and Content-Guided Video Synthesis with Diffusion Models (Gen‑1) ICCV 2023 Website
UniSim: Learning Interactive Real-World Simulators (Unisim) ICLR 2024 Website
Sora: Creating video from text (Sora) OpenAI 2024
World model on million-length video and language with ring-attention (LWM) ICLR 2025 Star
Genie: Generative Interactive Environmentsn (Genie) ICML 2024 Website
iVideoGPT: Interactive VideoGPTs are Scalable World Models (iVideoGPT) NeurIPS 2024 Star
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer (CogVideoX) ICLR 2025 Star
Wan: Open and Advanced Large-Scale Video Generative Models (Wan) 2025 Star
Cosmos World Foundation Model Platform for Physical AI (Cosmos) 2025 Star

Interactive 3D environment

Title Pub.&Date Code/Project URL
Interactive 3D Scene Generation from a Single Image (WonderWorld) CVPR 2025 Star
Matrix-3D: Omnidirectional Explorable 3D World Generation (Matrix-3D) 2025 Star
HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels (Hunyuan World) 2025 Star

Application

Title Pub.&Date Code/Project URL
DayDreamer: World Models for Physical Robot Learning (DayDreamer) 2023 Star
Generative Agents: Interactive Simulacra of Human Behavior (Generative Agents) UIST 2023 Star
GAIA-1: A generative world model for autonomous driving (GAIA-1) 2023 Website
OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving (OccWorld) ECCV 2024 Star
Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation (GR1) ICLR 2024 Star
DriveDreamer: Towards real-world-driven world models (DriveDreamer) ECCV 2024 Star
Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving (Drive-WM) CVPR 2024 Star
Think2Drive: Efficient Reinforcement Learning by Thinking in Latent World Model for Quasi-Realistic Autonomous Driving (in CARLA-v2) (Think2Drive) ECCV 2024 Website
Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving (Drive-WM) CVPR 2024 Star
GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving (GaussianWorld) CVPR 2025 Star
Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving (Drive-WM) 2025 Website
World and Human Action Models towards Gameplay Ideation (WHAM) Nature 2025
Mineworld: a Real-time and Open-source Interactive World Model on Minecraft (MineWorld) 2025 Star
GameFactory: Creating New Games with Generative Interactive Videos (Gamefactory) ICCV 2025 Star
AgentSociety: Large-scale simulation of LLM-driven generative agents (AgentSociety) ACL 2025, COLM 2025 Star
EnerVerse-AC: Envisioning Embodied Environments with Action Condition (EnerVerse) 2025 Star
GR-3 Technical Report (GR3) 2025 Website
Aether: Geometric-Aware Unified World Modeling (Aether) 2025 Star
GWM: Towards Scalable Gaussian World Models for Robotic Manipulation (GWM) ICCV 2025 Website
AirScape: An Aerial Generative World Model with Motion Controllability (AirScape) ACM MM 2025 Website
RoboScape: Physics-informed Embodied World Model (RoboScape) 2025 Star
DreamGen: Unlocking Generalization in Robot Learning through Video World Models (DreamGen) 2025 Star
Matrix-Game: Interactive World Foundation Model (Matrix-Game) 2025 Star
Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation (Genie Envisioner) 2025 Star

3 Implicit Representation of the External World

3.1 World Model in Decision Making

Title Pub. & Date Code/Project URL
Deep reinforcement learning in a handful of trials using probabilistic dynamics models NeurIPS 2018 Star
PWM: Policy Learning with Multi-Task World Models ICLR 2025 Star
Recurrent world models facilitate policy evolution NeurIPS 2018 Website
Dream to control: Learning behaviors by latent imagination ICLR 2020 Star
Leveraging pre-trained large language models to construct and utilize world models for model-based task planning NeurIPS 2023 Star
Mastering atari with discrete world models ICLR 2021 Star
Mastering diverse control tasks through world models Nature 2024 Star
TD-MPC2: Scalable, Robust World Models for Continuous Control ICLR 2024 Star
When to trust your model: Model-based policy optimization NeurIPS 2019 Star
Offline reinforcement learning as one big sequence modeling problem NeurIPS 2021 Star
Model predictive control Springer
Algorithmic framework for model-based deep reinforcement learning with theoretical guarantees ICLR 2019 Star
Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning ICRA 2018 Star
A game theoretic framework for model based reinforcement learning ICML 2021 Star
General agents need world models ICML 2025
Mastering memory tasks with world models ICLR 2024 Star
A generalist dynamics model for control arXiv 2023
Exploring model-based planning with policy networks ICLR 2020 Star
A0c: Alpha zero in continuous action space arXiv 2018 Star
Probabilistic adaptation of text-to-video models ICLR 2024 Website
RoboDreamer: Learning Compositional World Models for Robot Imagination ICML 2024 Star
Discuss before moving: Visual language navigation via multi-expert discussions ICRA 2024 Star
OVER-NAV: Elevating Iterative Vision-and-Language Navigation with Open-Vocabulary Detection and Structured Representation CVPR 2024 Star
RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation CVPR 2024 Website
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models arXiv 2025
Position: LLMs can't plan, but can help planning in LLM-modulo frameworks ICML 2024
Language models meet world models: Embodied experiences enhance language models NeurIPS 2023 Star
Virtualhome: Simulating household activities via programs CVPR 2018 Star
Learning to Model the World with Language ICML 2024 Star
Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency ICML 2024 Star
Alfworld: Aligning text and embodied environments for interactive learning ICLR 2021 Star
Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents EMNLP 2024 Star
Agent Planning with World Knowledge Model NeurIPS 2024 Star
WorldCoder, a Model-Based LLM Agent: Building World Models by Writing Code and Interacting with the Environment NeurIPS 2024 Star
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation ICLR 2025 Star

3.2 World Knowledge Learned by Models

Title Pub. & Date Code / Project URL
Does the chimpanzee have a theory of mind? Behav. & Brain Sci. 1978
GPT4GEO: How a Language Model Sees the World’s Geography NeurIPS 2023 Star
LLMs achieve adult human performance on higher-order theory of mind tasks arXiv 2024
COKE: A cognitive knowledge graph for machine theory of mind ACL 2024 Star
Think Twice: Perspective-Taking Improves LLM Theory-of-Mind ACL 2024 Star
Language Models Represent Space and Time ICLR 2024 Star
GeoLLM: Extracting Geospatial Knowledge from Large Language Models ICLR 2024 Star
Large language models are geographically biased ICML 2024 Star
Emergent Representations of Program Semantics in Language Models Trained on Programs ICML 2024 Star
BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages NeurIPS 2024 Star
SafeWorld: Geo-Diverse Safety Alignment NeurIPS 2024 Star
EAI: Emotional Decision-Making of LLMs in Strategic Games and Ethical Dilemmas NeurIPS 2024 Star
Testing theory of mind in large language models and humans Nature Human Behaviour 2024 Website
Automated construction of cognitive maps with visual predictive coding Nature Machine Intelligence 2024 Star
Evaluating Large Language Models in Theory of Mind Tasks PNAS 2024 Website
Elements of World Knowledge (EWOK) Transactions of the ACL 2025 Website
The Geometry of Concepts: Sparse Autoencoder Feature Structure Entropy 2025 Star
AgentMove: A large language model based agentic framework for zero-shot next location prediction NAACL 2025 Star
CityGPT: Empowering Urban Spatial Cognition of Large Language Models KDD 2025 Star
CityBench: Evaluating the Capabilities of Large Language Model as World Model KDD 2025 Star
LocalGPT: Benchmarking and Advancing Large Language Models for Local Life Services KDD 2025 Star
UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence ICCV 2025 Star
Open-Set Living Need Prediction with Large Language Models ACL 2025 Findings Star
Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation ACL 2025 Findings Website
Mitigating Geospatial Knowledge Hallucination in Large Language Models: Benchmarking and Dynamic Factuality Aligning EMNLP 2025 Findings Star
GPS as a Control Signal for Image Generation CVPR 2025 Star
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages CVPR 2025 Website
Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models CVPR 2025 Star
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces CVPR 2025 Website
A Survey of Large Language Model-Powered Spatial Intelligence Across Scales arXiv 2025
AI's Blind Spots: Geographic Knowledge and Diversity Deficit in Generated Urban Scenario arXiv 2025
Recognition through Reasoning: Reinforcing Image Geo-localization with Large Vision-Language Models arXiv 2025

4 Future Prediction of the Physical World

4.1 World Model as Video Generation

Title Pub. & Date Code / Project URL
Video generation models as world simulators OpenAI Blog 2024
Sora: Creating video from text OpenAI 2024
Is Sora a world simulator? A comprehensive survey on general world models and beyond arXiv 2024 Star
Sora as an AGI world model? A complete survey on text-to-video generation arXiv 2024
How Far is Video Generation from World Model: A Physical Law Perspective ICML 2025 Star
Do generative video models learn physical principles from watching videos? arXiv 2025 Star
Genesis: A Generative and Universal Physics Engine for Robotics and Beyond ICML 2024 Star
PhysGen: Rigid-body physics-grounded image-to-video generation ECCV 2024 Star
NUWA-XL: Diffusion over Diffusion for Extremely Long Video Generation ACL 2023 Website
OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving ECCV 2024 Star
OccSora: 4D Occupancy Generation Models as World Simulators ICLR 2025 Star
World model on million-length video and language with ring-attention ICLR 2025 Star
GAIA-1: A generative world model for autonomous driving arXiv 2023 Website
DriveDreamer: Towards real-world-driven world models ECCV 2024 Star
DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation AAAI 2025 Star
Driving into the Future: Multiview Visual Forecasting and Planning with World Model CVPR 2024 Star
Vista: A Generalizable Driving World Model with High Fidelity NeurIPS 2024 Star
WorldDreamer: Towards general world models for video generation arXiv 2024 Star
WorldGPT: a Sora-inspired video AI agent arXiv 2024

4.2 World Model as Embodied Environment

Title Pub. & Date Code / Project URL
Holodeck: Language guided generation of 3d embodied ai environments CVPR 2024 Star
GRUtopia: Dream General Robots in a City at Scale arXiv 2024 Star
Anyhome: Open-vocabulary generation of structured and textured 3d homes ECCV 2024 Star
LEGENT: Open Platform for Embodied Agents arXiv 2024 Star
UrbanWorld: An Urban World Model for 3D City Generation arXiv 2024 Star
MetaUrban: An Embodied AI Simulation Platform for Urban Micromobility ICLR 2025 Star
Minedojo: Building open-ended embodied agents with internet-scale knowledge NeurIPS 2022 Star
UniSim: Learning Interactive Real-World Simulators ICLR 2024 Website
EmbodiedCity: A Benchmark Platform for Embodied Agent in Real-world City Environment arXiv 2024 Star
Empowering World Models with Reflection for Embodied Video Prediction ICML 2025
Streetscapes: Large-scale consistent street view generation using autoregressive video diffusion SIGGRAPH 2024
AVID: Adapting Video Diffusion Models to World Models arXiv 2024 Star
Pandora: Towards General World Model with Natural Language Actions and Video States arXiv 2024 Star
RoboScape: Physics-informed Embodied World Model arXiv 2025 Star
TesserAct: Learning 4D Embodied World Models arXiv 2025 Star

5 Applications of World Models

5.1 Game Intelligence

Title Pub. & Date Code / Project URL
World and Human Action Models towards Gameplay Ideation Nature 2025
GameFactory: Creating New Games with Generative Interactive Videos ICCV 2025 Star
Unbounded: A Generative Infinite Game of Character Life Simulation CVPR 2025 Website
GameGen-𝕏: Interactive Open-world Game Video Generation ICLR 2025 Star
Diffusion Models Are Real-Time Game Engines ICLR 2025 Website
Exploration-Driven Generative Interactive Environments ICLR 2025 Star
Matrix-Game: Interactive World Foundation Model arXiv 2025 Star
Mineworld: a Real-time and Open-source Interactive World Model on Minecraft arXiv 2025 Star
Model as a Game: On Numerical and Spatial Consistency for Generative Games arXiv 2025

5.2 Embodied Intelligence

Title Pub. & Date Code / Project URL
OpenEQA: Embodied Question Answering in the Era of Foundation Models CVPR 2024 Star
iVideoGPT: Interactive VideoGPTs are Scalable World Models NeurIPS 2024 Star
IRASim: A Fine-Grained World Model for Robot Manipulation ICCV 2025 Star
RoboScape: Physics-informed Embodied World Model arXiv 2025 Star
TesserAct: Learning 4D Embodied World Models arXiv 2025 Star
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning arXiv 2025 Star
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations ICML 2025 Star
DreamGen: Unlocking Generalization in Robot Learning through Video World Models arXiv 2025 Star
EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation arXiv 2025 Website
EnerVerse-AC: Envisioning Embodied Environments with Action Condition arXiv 2025 Star
Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation arXiv 2025 Star
Vidar: Embodied Video Diffusion Model for Generalist Bimanual Manipulation arXiv 2025
WorldVLA: Towards Autoregressive Action World Model arXiv 2025 Star
ManiGaussian++: General Robotic Bimanual Manipulation with Hierarchical Gaussian World Model arXiv 2025 Star
ORV: 4D Occupancy-centric Robot Video Generation arXiv 2025 Star
GWM: Towards Scalable Gaussian World Models for Robotic Manipulation ICCV 2025 Website
WorldEval: World Model as Real-World Robot Policies Evaluator arXiv 2025 Star

5.3 Urban Intelligence

Autonomous Driving

Title Pub. & Date Code / Project URL
Video generation models as world simulators OpenAI Research (2024) Website
GPT-4 technical report arXiv 2023
Visual Instruction Tuning NeurIPS 2023 Star
World models for autonomous driving: An initial survey IEEE T-IV 2024
Waymax: An accelerated, data-driven simulator for large-scale autonomous driving research arXiv 2023 Star
Planning-oriented autonomous driving CVPR 2023 Star
A survey on trajectory-prediction methods for autonomous driving IEEE T-IV 2022
BEVFormer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers ECCV 2022 Star
Transfusion: Robust lidar-camera fusion for 3D object detection with transformers CVPR 2022 Star
YOLOP: You only look once for panoptic driving perception MIR 2022 Star
Wayformer: Motion forecasting via simple & efficient attention networks ICRA 2023
Motion Transformer with Global Intention Localization and Local Movement Refinement NeurIPS 2022 Star
Query-Centric Trajectory Prediction CVPR 2023 Star
HPTR: Real-time motion prediction via heterogeneous polyline transformer with relative pose encoding NeurIPS 2023 Star
MotionDiffuser: Controllable multi-agent motion prediction using diffusion CVPR 2023
Tokenize the world into object-level knowledge to address long-tail events in autonomous driving arXiv 2024
GAIA-1: A generative world model for autonomous driving arXiv 2023 Website
DriveDreamer: Towards real-world-driven world models for autonomous driving ECCV 2024 Star
Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving CVPR 2024 Star
OccWorld: Learning a 3D occupancy world model for autonomous driving ECCV 2024 Star
OccSora: 4D occupancy generation models as world simulators for autonomous driving arXiv 2024 Star
Vista: A generalizable driving world model with high fidelity and versatile controllability NeurIPS 2024 Star
Copilot4D: Learning unsupervised world models for autonomous driving via discrete diffusion ICLR 2024
MUVO: A multimodal generative world model for autonomous driving with geometric representations IEEE T-IV 2025 Star
UniWorld: Autonomous driving pre-training via world models arXiv 2023 Star
MetaUrban: A simulation platform for embodied AI in urban spaces ICLR 2025 Star
UrbanWorld: An urban world model for 3D city generation arXiv 2024 Star
Streetscapes: Large-scale consistent street view generation using autoregressive video diffusion SIGGRAPH 2024 Website

Autonomous Logistics & Urban Analytics

Title Pub. & Date Code / Project URL
Navigation World Models CVPR 2025 Star
Towards Autonomous Micromobility through Scalable Urban Simulation CVPR 2025 Star
Vid2Sim: Realistic and Interactive Simulation from Video for Urban Navigation CVPR 2025 Star
CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos CVPR 2025 Star
AirScape: An Aerial Generative World Model with Motion Controllability (AirScape) ACM MM 2025 Website
CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global Memory ACL 2025 Star
CityEQA: A Hierarchical LLM Agent on Embodied Question Answering Benchmark in City Space EMNLP 2025 Star
UrbanVideo-Bench: Benchmarking Vision-Language Models on Embodied Intelligence with Video Data in Urban Spaces ACL 2025 Star
GeoLLM: Extracting Geospatial Knowledge from Large Language Models ICLR 2024 Star
CityGPT: Empowering Urban Spatial Cognition of Large Language Models KDD 2025 Star
UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence ICCV 2025 Star
GPS as a Control Signal for Image Generation CVPR 2025 Star
AI's Blind Spots: Geographic Knowledge and Diversity Deficit in Generated Urban Scenario arXiv 2025
AgentMove: A Large Language Model based Agentic Framework for Zero-shot Next Location Prediction NAACL 2025 Star
CAMS: A CityGPT-Powered Agentic Framework for Urban Human Mobility Simulation arXiv 2025 Star
Open-Set Living Need Prediction with Large Language Models ACL 2025 Star

5.4 Societal Intelligence

Title Pub. & Date Code / Project URL
AgentSociety: Large-scale simulation of LLM-driven generative agents ACL 2025, COLM 2025 Star
GenSim: A General Social Simulation Platform with Large Language Model based Agents NAACL 2025 Star
Simulating Human-like Daily Activities with Desire-driven Autonomy ICLR 2025 Star
EconAgent: Large language model-empowered agents for simulating macroeconomic activities ACL 2024 Star
Agent-Pro: Learning to evolve via policy-level reflection and optimization ACL 2024 Star
Exploring collaboration mechanisms for LLM agents: A social psychology view ACL 2024 Star
Cooperate or Collapse: Emergence of sustainability behaviors in a society of LLM agents NeurIPS 2024 Star
SocioDojo: Building Lifelong Analytical Agents with Real-world Text and Time Series ICLR 2024 Star
SRAP-Agent: Simulating and optimizing scarce resource allocation policy with LLM-based agent EMNLP 2024 Star
Generative agents: Interactive simulacra of human behavior UIST 2023 Star
SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users arXiv 2025 Star
YuLan-OneSim: Towards the Next Generation of Social Simulator with Large Language Models arXiv 2025 Star
OASIS: Open Agent Social Interaction Simulations with One Million Agents arXiv 2024 Star
Project Sid: Many-agent simulations toward AI civilization arXiv 2024 Star
Network Formation and Dynamics Among Multi-LLMs arXiv 2024 Star
S3: Social-network Simulation System with Large Language Model-Empowered Agents arXiv 2023
Exploring large language models for communication games: An empirical study on werewolf arXiv 2023 Star

Releases

No releases published

Packages

No packages published