TD-MPC2 Implementation by luizfacury · Pull Request #159 · galilai-group/stable-worldmodel

luizfacury · 2026-03-13T19:50:05Z

This pull request introduces a new training pipeline for the TD-MPC2 world model and policy. The main changes include adding a comprehensive configuration file for TD-MPC2, implementing a new training script with data preprocessing and model management, and registering the TD-MPC2 model in the world model package. These updates enable flexible, modular training of TD-MPC2 using Hydra and PyTorch Lightning, and support multiple observation modalities.

TD-MPC2 Training Pipeline Integration:

Added a new Hydra configuration file tdmpc2.yaml specifying all hyperparameters for data loading, model architecture, planning, optimization, and logging for the TD-MPC2 algorithm.
Implemented tdmpc2.py, a new training script that:
- Dynamically builds datasets and preprocessing transforms for multiple modalities (e.g., pixels, state).
- Defines the TD-MPC2 forward pass, loss computation, and policy update logic.
- Sets up PyTorch Lightning training, including a callback for periodic model checkpointing.
- Integrates with the stable world model and stable pretraining libraries for modular training and data management.

Model Registration:

Registered the TD-MPC2 model and module in the stable_worldmodel.wm package by updating the __init__.py file, making it available for import and use throughout the codebase.

quentinll

Thank you for this nice contribution! Do you already have results using TDMPC2 in some environments?

quentinll · 2026-03-13T19:59:03Z

stable_worldmodel/wm/tdmpc2.py

+        return G + discount * (1 - termination) * conservative_q
+
+    @torch.no_grad()
+    def _plan(self, obs_dict, goal_dict, step_idxs, eval_mode=False):


It would be nice to use the MPPI algorithm already implemented in stable-worldmodel. I believe, one way would be to use the policy API defined here https://github.com/galilai-group/stable-worldmodel/blob/main/stable_worldmodel/policy.py

I tested it on pushT and managed to get about 85% success rate predicting 50 steps ahead of a random state in a trajectory 100 times.

About the solver: td-mpc2 uses some trajectories generated directly by the Actor network, rather than by pure Gaussian noise. From their paper:

To accelerate convergence of planning, a fraction of action sequences originate from the policy prior pi , and we warm-start planning by initializing ( μ , σ ) as the solution to the previous decision step shifted by 1.

From what I understand, in your standard mppi you don't update the var of the distribution, just the mean, and all of the trajectories are randomly generated. I tested using the implemenetd MPPI, and to me at least it hallucinated much more with OOD actions.
I used your FeedForwardPolicy to work with it.
I can change it if you want anyway, I just tried to do it as close as possible to their impl.

Our MPPI implementation already supports starting from an initial guess

stable-worldmodel/stable_worldmodel/solver/mppi.py

Line 84 in 0793c56

def init_action_distrib(

so it is possible to warm start it using the output from an actor network. For now FeedForwardPolicy does not support hybrid strategies like TDMPC. I believe we should add a new policy class that would allow to use MPC by starting from the guess of an actor.

Ok, I created a tdmpc policy that uses the warm start for the actor. I tested for pushT again and it's working for me.

Do you need something else for this model? @quentinll

Hey Luiz, thanks again for this very nice contribution. This seems close to what I had in mind in terms of implementation, I will have a closer look ASAP. Which script are you using for evaluation? In the meantime, it would be nice if you could do additionnal evaluations on other environments such as tworoom and OGBench cube. What do you think?

Sure, I will downolad the data and test on tworoom and cube. I will let you know when I have the results.

I did quite a few changes on how the policy uses the actor to warm start the solver. I removed the TDMPCPolicy and extended the WorldModelPolicy to support this. Could you let me know if you are still getting the same results? In theory the behavior should be identical.

lucas-maes · 2026-03-18T16:57:50Z

stable_worldmodel/wm/tdmpc2.py

+    Assumptions:
+        - Continuous Control: The algorithm assumes continuous action spaces.
+        - Action Bounds: Actions are strictly assumed to be normalized to the range [-1.0, 1.0].
+            The actor network and MPPI planner enforce this bound via Tanh and clamping.


Is this still accurate?

Yes, tdmpc2 uses a squashed Gaussian (tanh at the output), so actions are bounded to (-1, 1) this way. Normalizing to this range is required for the two-hot value distribution and symlog reward scaling to be correct.

lucas-maes · 2026-03-18T16:58:43Z

stable_worldmodel/wm/tdmpc2.py

+        if self.use_pixels:
+            self.cnn = nn.Sequential(
+                nn.Conv2d(6, 32, 7, stride=2),
+                nn.Mish(),
+                nn.Conv2d(32, 32, 5, stride=2),
+                nn.Mish(),
+                nn.Conv2d(32, 32, 3, stride=2),
+                nn.Mish(),
+                nn.Conv2d(32, 32, 3, stride=1),
+                nn.Mish(),
+                nn.Flatten(),
+            )


We usually use 224x224 image would that be an issue?

It's fine, the cnn works regardless of image size, and output is configured accordingly. I tested different dimensions too

lucas-maes · 2026-03-18T16:59:46Z

stable_worldmodel/wm/tdmpc2.py

+                continue  # Handled by primary backbone
+
+            in_dim = cfg.extra_dims[key] * 2
+            self.extra_encoders[key] = nn.Sequential(


why extra encoders should necessarily instantiate these networks?

You're right, I changed it.

lucas-maes · 2026-03-18T17:00:49Z

stable_worldmodel/wm/tdmpc2.py

+        self.reward = mlp(
+            self.latent_dim + cfg.action_dim, cfg.wm.mlp_dim, cfg.wm.num_bins
+        )
+        self.pi = mlp(self.latent_dim, cfg.wm.mlp_dim, 2 * cfg.action_dim)


can you add so comment to help people understand what is pi etc? (I know it's the policy but it might be confusing for new-comers)

lucas-maes · 2026-03-18T17:02:02Z

stable_worldmodel/wm/tdmpc2.py

+        for p in self.target_qs.parameters():
+            p.requires_grad = False
+
+    def encode(self, obs_dict, goal_dict):


why do encoder have the goal_dict?

cf. DINO-WM you can simple have encode(dict)

lucas-maes · 2026-03-18T17:02:45Z

stable_worldmodel/wm/tdmpc2.py

+
+    def forward(self, z, action):
+        """
+        Predicts the next latent state and expected reward given the current latent state and action.


wouldn't it be more clear if it was called predict?

I used forward because it's pytorch's convention, to use model()

lucas-maes · 2026-03-18T17:03:43Z

stable_worldmodel/wm/tdmpc2.py

+            if key != 'pixels':
+                if obs.ndim >= 3:
+                    obs = obs[..., -1, :]
+                if goal.ndim >= 3:
+                    goal = goal[..., -1, :]


redundant code, have a look at DINO-WM implem

Ok, I think I improved it

…y; add new protocols in protocols.py

…Policy to incorporate a warm_start from an actionable model

…ction function

…port warm-start actions

…rt prefix actions

…olvers and update related logic

…sts to ensure correct tensor shapes and values

luizfacury changed the title ~~tdmpc2 implementation~~ TD-MPC2 Implementation Mar 13, 2026

quentinll reviewed Mar 13, 2026

View reviewed changes

quentinll force-pushed the tdmpc2 branch from eb9ee94 to 137c48a Compare March 18, 2026 16:46

lucas-maes requested changes Mar 18, 2026

View reviewed changes

luizfacury and others added 12 commits March 23, 2026 09:40

tdmpc2 implementation

8ae06a8

adding tdmpc policy

e64eed4

refactor: remove Transformable and Actionable protocols from policy.p…

c927664

…y; add new protocols in protocols.py

refactor: remove TDMPCPolicy and related protocols; extend WorldModel…

4b71fa8

…Policy to incorporate a warm_start from an actionable model

test: add warm-start tests for WorldModelPolicy with actionable models

838af85

fix: update import statement for Costable protocol in LagrangianSolver

3e68daf

updating documentation

a69055d

fix: add TODO comment to address state handling issue in build_init_a…

a6a91ca

…ction function

feat: enhance Actionable protocol and TDMPC2 get_action method to sup…

41aea9d

…port warm-start actions

fix: update get_action method in MockActionableCostableModel to suppo…

b3f9504

…rt prefix actions

refactor: replace build_init_action with prepare_init_action across s…

68f42b4

…olvers and update related logic

fix: update init_action handling in CEMSolver and LagrangianSolver te…

7c5db1e

…sts to ensure correct tensor shapes and values

quentinll force-pushed the tdmpc2 branch from b1127ba to 7c5db1e Compare March 23, 2026 13:42

tdmpc2 fixes and online training addition

e22947e

Conversation

luizfacury commented Mar 13, 2026

Uh oh!

quentinll left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

luizfacury Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

luizfacury Mar 14, 2026 •

edited

Loading