Skip to content

examples(lerobot): collect-train-run MolmoAct2 example [WIP]#586

Open
sundargthb wants to merge 4 commits into
strands-labs:mainfrom
sundargthb:examples/molmoact-training
Open

examples(lerobot): collect-train-run MolmoAct2 example [WIP]#586
sundargthb wants to merge 4 commits into
strands-labs:mainfrom
sundargthb:examples/molmoact-training

Conversation

@sundargthb

Copy link
Copy Markdown
Contributor

Draft — do not merge until GPU-validated.

Adds a runnable 3-phase example for the MolmoAct2 data-flywheel blog (collect → train → run), under examples/lerobot/.

What it does

  • collect — drives the pretrained allenai/MolmoAct2-SO100_101 teacher in MuJoCo sim (two RGB cameras matching the checkpoint), records a LeRobot v3
    dataset with domain randomization, optional Hub push.
  • train — prints the upstream LeRobot fine-tune command (training runs upstream; MolmoAct2 is integrated as a LeRobot policy).
  • run — loads the fine-tuned checkpoint back through Robot() in sim (or mode="real").

Validation status

Statically validated: compiles, every sim API call checked against the tool spec, inputs aligned with the MolmoAct2-SO100_101 model card (norm_tag, two cameras, bf16 on a 24GB L4). Not yet run on a GPU.

Two items to confirm on a g6.4xlarge (L4 24GB)

@cagataycali

  1. Exact lerobot.scripts.train flags for the MolmoAct2 policy (authoritative ref: LeRobot docs/source/molmoact2.mdx). The printed command uses the standard pattern; confirm flag names.
  2. The collect path — that driving the teacher via run_policy(policy_object=...) while recording produces usable demonstrations (vs. the lower-level
    get_actionsset_joint_positions loop in molmoact2_sim_pickplace.py).

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@yinsong1986 yinsong1986 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Adds a runnable 3-phase examples/lerobot/ CLI (collect / train / run) plus a README for the MolmoAct2 data-flywheel blog: drive the pretrained allenai/MolmoAct2-SO100_101 teacher in MuJoCo sim to record a LeRobot v3 dataset, print the upstream LeRobot fine-tune command, then load the fine-tuned checkpoint back through Robot(). Marked WIP / pending-GPU-validation. Scope is example-only (no library code touched), and most sim API calls (add_object, start_recording, run_policy incl. policy_object=, set_joint_positions, render, randomize, reset, async get_actions returning a list) line up with the actual SDK signatures.

Verification suggestions

  • python examples/lerobot/collect_train_run_molmoact2.py train is GPU-free and exercises the print path; safe to smoke-test in CI.
  • The collect / run phases need a GPU + weights, but the camera-setup bug flagged inline is reproducible without a model load (it surfaces the moment _build_scene runs the first add_camera dispatch).

sim._dispatch_action(
"add_camera",
{
"camera_name": "image",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MUST FIX] add_camera is called with camera_name, but the sim action's parameter is name (strands_robots/simulation/mujoco/simulation.py:1208). The dispatch router validates kwargs against the method signature and rejects unknown params with an error dict (simulation.py:2472-2480); camera_name is not in _FIELD_ALIASES (simulation.py:2407-2416) and add_camera has no **kwargs, so both _dispatch_action("add_camera", {"camera_name": ...}) calls (here at line 102 and at line 113) return {"status": "error", ...}.

Why this is must-fix and not deferrable: _build_scene discards the return value, so the failure is silent — no exception, no log. Both runnable phases (collect and run) then proceed with zero cameras: collect records a LeRobot dataset with no camera frames (the entire purpose of the collect phase), and run feeds observations missing the two RGB views the SO100_101 checkpoint requires. This is a deterministic data-loss bug on the only executable code paths, reproducible without a GPU, and directly contradicts the PR's claim that every sim API call was checked against the tool spec.

Resolution: rename the key to name ({"name": "image", ...} / {"name": "wrist_image", ...}). Note render at line 279 correctly uses camera_name — only add_camera takes name. Consider also checking the dispatch result in _build_scene so a future schema mismatch fails loud per AGENTS.md ("No silent defaults on error").

Runnable 3-phase CLI for the MolmoAct2 data-flywheel blog:

- collect: drive the pretrained allenai/MolmoAct2-SO100_101 teacher in MuJoCo
  sim (two RGB cameras matching the checkpoint), record a LeRobot v3 dataset
  with domain randomization, optionally push to the Hub.
- train: print the upstream LeRobot fine-tune command (training runs upstream;
  MolmoAct2 is integrated as a LeRobot policy).
- run: load the fine-tuned checkpoint back through Robot() in sim (or mode=real).

Statically validated: compiles, every sim API call checked against the tool
spec, inputs aligned with the MolmoAct2-SO100_101 model card (norm_tag, two
cameras, bf16 on a 24GB L4). PENDING GPU validation on a g6.4xlarge for two
items flagged in the README: exact lerobot train flags, and the
record-via-run_policy path.
@sundargthb sundargthb force-pushed the examples/molmoact-training branch from 9e17083 to 27e5ba5 Compare June 22, 2026 21:49

@yinsong1986 yinsong1986 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Adds a runnable 3-phase examples/lerobot/ CLI (collect / train / run) plus a README for the MolmoAct2 data-flywheel blog: drive the pretrained allenai/MolmoAct2-SO100_101 teacher in MuJoCo sim to record a LeRobot v3 dataset, print the upstream LeRobot fine-tune command, then load the fine-tuned checkpoint back through Robot(). Scope is example-only (no library code touched) and the PR is explicitly marked WIP / do-not-merge-until-GPU-validated. I checked the sim dispatch surface against the actual SDK: add_object, start_recording (repo_id/task/fps/push_to_hub), run_policy (incl. policy_object=/n_steps), set_joint_positions (accepts the per-joint dict from get_actions()[0]), render (camera_name), randomize, and reset all line up with their signatures.

The one deterministic blocker on the executable paths — add_camera being dispatched with camera_name instead of name at both _build_scene call sites — is already flagged in full by the prior inline review (yinsong1986), including its silent-data-loss impact. I am not re-raising it. The remaining open questions (exact lerobot.scripts.train flags, whether the run_policy recording path yields usable demos) are the author's own [verify] items pending the first GPU run, not independent review findings. No new must-fix concerns to add; deferring to the existing review thread and the human reviewer.

…m errors

Addresses review on strands-labs#586. add_camera's parameter is 'name' (not
'camera_name'); the dispatch router rejects unknown kwargs, so the prior
calls silently returned error dicts and produced a camera-less scene,
breaking both collect (no frames) and run (no images). Rename to 'name',
and route all _build_scene dispatches through a _must() helper that raises
on status:error per AGENTS.md (no silent defaults on error). Re-audited the
other dispatched calls (start_recording / run_policy / randomize /
get_observation / set_joint_positions / render) against their signatures.
@sundargthb sundargthb requested a review from yinsong1986 June 23, 2026 00:01

@yinsong1986 yinsong1986 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Adds an examples-only, explicitly-WIP 3-phase CLI (collect / train / run) plus a README under examples/lerobot/ for the MolmoAct2 data-flywheel blog: drive the pretrained allenai/MolmoAct2-SO100_101 teacher in MuJoCo sim to record a LeRobot v3 dataset (collect), print the upstream LeRobot fine-tune command for the user to run (train), then load the fine-tuned checkpoint back through Robot() (run). No library code is touched. The diff is consistent with the PR description's stated scope and draft/pending-GPU-validation status.

What's good

  • The prior review's blocker (add_camera called with camera_name instead of name) is resolved: the scene builder now uses name and routes every dispatch through _must, which raises on {"status": "error"} rather than silently no-op'ing a missing camera — matching AGENTS.md "No silent defaults on error."
  • train only print()s a copy-pasteable command; it does not shell out, so there is no subprocess-injection surface from the user-supplied --hf-user / --dataset-name args.
  • The two highest-risk behavioral assumptions (whether run_policy(policy_object=...) records usable demos, and the exact lerobot.scripts.train flag names) are self-flagged [verify] in both the script and the README, which is the right call for a draft pending hardware validation.
  • Files are ASCII-clean (the README's / are box-drawing in a fenced ASCII-art diagram, not emojis in tool/log/error strings).

Verification suggestions

The author's own [verify] items are the right spot-checks on the first GPU run: confirm collect produces a non-empty LeRobot v3 dataset (parquet + per-camera MP4) end-to-end (start → record N episodes → stop → reopen → assert frames + both image/wrist_image keys present), and confirm the printed train flags against ... train --help on the MolmoAct2 LeRobot branch.

@yinsong1986 yinsong1986 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Examples-only, explicitly-WIP PR adding a 3-phase CLI (collect / train / run) plus a README under examples/lerobot/ for the MolmoAct2 data-flywheel blog: drive the pretrained allenai/MolmoAct2-SO100_101 teacher in MuJoCo sim to record a LeRobot v3 dataset, print (not execute) the upstream LeRobot fine-tune command, then load the fine-tuned checkpoint back through Robot(). No library code is touched.

Walked the full file (all 356 lines). No merge-blocking concerns: the train command interpolates user-supplied values into a printed string only (no subprocess, no shell, no injection sink), the Path.home() joins are not traversal sinks, and no secrets are logged. The add_camera calls now correctly use name (resolving the prior review's camera_name MUST FIX), and the sim dispatch calls match the current simulation.py signatures. The PR is appropriately gated as draft/pending-GPU-validation, and the two open questions (run_policy collection path, exact lerobot.scripts.train flags) are already surfaced by the author and prior reviewer as [verify] items for the first hardware run — those are validation gaps, not code blockers.

What's good

  • Scope discipline: examples-only, zero library mutations.
  • _must() wrapper surfaces error dicts from scene-setup dispatch instead of silently no-op'ing, matching AGENTS.md "no silent defaults on error."
  • No emojis in user-facing strings; plain ASCII throughout.
  • Honest WIP framing with the unverified assumptions called out inline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

3 participants