examples(lerobot): collect-train-run MolmoAct2 example [WIP]#586
examples(lerobot): collect-train-run MolmoAct2 example [WIP]#586sundargthb wants to merge 4 commits into
Conversation
yinsong1986
left a comment
There was a problem hiding this comment.
Summary
Adds a runnable 3-phase examples/lerobot/ CLI (collect / train / run) plus a README for the MolmoAct2 data-flywheel blog: drive the pretrained allenai/MolmoAct2-SO100_101 teacher in MuJoCo sim to record a LeRobot v3 dataset, print the upstream LeRobot fine-tune command, then load the fine-tuned checkpoint back through Robot(). Marked WIP / pending-GPU-validation. Scope is example-only (no library code touched), and most sim API calls (add_object, start_recording, run_policy incl. policy_object=, set_joint_positions, render, randomize, reset, async get_actions returning a list) line up with the actual SDK signatures.
Verification suggestions
python examples/lerobot/collect_train_run_molmoact2.py trainis GPU-free and exercises the print path; safe to smoke-test in CI.- The
collect/runphases need a GPU + weights, but the camera-setup bug flagged inline is reproducible without a model load (it surfaces the moment_build_sceneruns the firstadd_cameradispatch).
| sim._dispatch_action( | ||
| "add_camera", | ||
| { | ||
| "camera_name": "image", |
There was a problem hiding this comment.
[MUST FIX] add_camera is called with camera_name, but the sim action's parameter is name (strands_robots/simulation/mujoco/simulation.py:1208). The dispatch router validates kwargs against the method signature and rejects unknown params with an error dict (simulation.py:2472-2480); camera_name is not in _FIELD_ALIASES (simulation.py:2407-2416) and add_camera has no **kwargs, so both _dispatch_action("add_camera", {"camera_name": ...}) calls (here at line 102 and at line 113) return {"status": "error", ...}.
Why this is must-fix and not deferrable: _build_scene discards the return value, so the failure is silent — no exception, no log. Both runnable phases (collect and run) then proceed with zero cameras: collect records a LeRobot dataset with no camera frames (the entire purpose of the collect phase), and run feeds observations missing the two RGB views the SO100_101 checkpoint requires. This is a deterministic data-loss bug on the only executable code paths, reproducible without a GPU, and directly contradicts the PR's claim that every sim API call was checked against the tool spec.
Resolution: rename the key to name ({"name": "image", ...} / {"name": "wrist_image", ...}). Note render at line 279 correctly uses camera_name — only add_camera takes name. Consider also checking the dispatch result in _build_scene so a future schema mismatch fails loud per AGENTS.md ("No silent defaults on error").
Runnable 3-phase CLI for the MolmoAct2 data-flywheel blog: - collect: drive the pretrained allenai/MolmoAct2-SO100_101 teacher in MuJoCo sim (two RGB cameras matching the checkpoint), record a LeRobot v3 dataset with domain randomization, optionally push to the Hub. - train: print the upstream LeRobot fine-tune command (training runs upstream; MolmoAct2 is integrated as a LeRobot policy). - run: load the fine-tuned checkpoint back through Robot() in sim (or mode=real). Statically validated: compiles, every sim API call checked against the tool spec, inputs aligned with the MolmoAct2-SO100_101 model card (norm_tag, two cameras, bf16 on a 24GB L4). PENDING GPU validation on a g6.4xlarge for two items flagged in the README: exact lerobot train flags, and the record-via-run_policy path.
9e17083 to
27e5ba5
Compare
yinsong1986
left a comment
There was a problem hiding this comment.
Summary
Adds a runnable 3-phase examples/lerobot/ CLI (collect / train / run) plus a README for the MolmoAct2 data-flywheel blog: drive the pretrained allenai/MolmoAct2-SO100_101 teacher in MuJoCo sim to record a LeRobot v3 dataset, print the upstream LeRobot fine-tune command, then load the fine-tuned checkpoint back through Robot(). Scope is example-only (no library code touched) and the PR is explicitly marked WIP / do-not-merge-until-GPU-validated. I checked the sim dispatch surface against the actual SDK: add_object, start_recording (repo_id/task/fps/push_to_hub), run_policy (incl. policy_object=/n_steps), set_joint_positions (accepts the per-joint dict from get_actions()[0]), render (camera_name), randomize, and reset all line up with their signatures.
The one deterministic blocker on the executable paths — add_camera being dispatched with camera_name instead of name at both _build_scene call sites — is already flagged in full by the prior inline review (yinsong1986), including its silent-data-loss impact. I am not re-raising it. The remaining open questions (exact lerobot.scripts.train flags, whether the run_policy recording path yields usable demos) are the author's own [verify] items pending the first GPU run, not independent review findings. No new must-fix concerns to add; deferring to the existing review thread and the human reviewer.
…m errors Addresses review on strands-labs#586. add_camera's parameter is 'name' (not 'camera_name'); the dispatch router rejects unknown kwargs, so the prior calls silently returned error dicts and produced a camera-less scene, breaking both collect (no frames) and run (no images). Rename to 'name', and route all _build_scene dispatches through a _must() helper that raises on status:error per AGENTS.md (no silent defaults on error). Re-audited the other dispatched calls (start_recording / run_policy / randomize / get_observation / set_joint_positions / render) against their signatures.
yinsong1986
left a comment
There was a problem hiding this comment.
Summary
Adds an examples-only, explicitly-WIP 3-phase CLI (collect / train / run) plus a README under examples/lerobot/ for the MolmoAct2 data-flywheel blog: drive the pretrained allenai/MolmoAct2-SO100_101 teacher in MuJoCo sim to record a LeRobot v3 dataset (collect), print the upstream LeRobot fine-tune command for the user to run (train), then load the fine-tuned checkpoint back through Robot() (run). No library code is touched. The diff is consistent with the PR description's stated scope and draft/pending-GPU-validation status.
What's good
- The prior review's blocker (
add_cameracalled withcamera_nameinstead ofname) is resolved: the scene builder now usesnameand routes every dispatch through_must, which raises on{"status": "error"}rather than silently no-op'ing a missing camera — matching AGENTS.md "No silent defaults on error." trainonlyprint()s a copy-pasteable command; it does not shell out, so there is no subprocess-injection surface from the user-supplied--hf-user/--dataset-nameargs.- The two highest-risk behavioral assumptions (whether
run_policy(policy_object=...)records usable demos, and the exactlerobot.scripts.trainflag names) are self-flagged[verify]in both the script and the README, which is the right call for a draft pending hardware validation. - Files are ASCII-clean (the README's
│/▼are box-drawing in a fenced ASCII-art diagram, not emojis in tool/log/error strings).
Verification suggestions
The author's own [verify] items are the right spot-checks on the first GPU run: confirm collect produces a non-empty LeRobot v3 dataset (parquet + per-camera MP4) end-to-end (start → record N episodes → stop → reopen → assert frames + both image/wrist_image keys present), and confirm the printed train flags against ... train --help on the MolmoAct2 LeRobot branch.
yinsong1986
left a comment
There was a problem hiding this comment.
Summary
Examples-only, explicitly-WIP PR adding a 3-phase CLI (collect / train / run) plus a README under examples/lerobot/ for the MolmoAct2 data-flywheel blog: drive the pretrained allenai/MolmoAct2-SO100_101 teacher in MuJoCo sim to record a LeRobot v3 dataset, print (not execute) the upstream LeRobot fine-tune command, then load the fine-tuned checkpoint back through Robot(). No library code is touched.
Walked the full file (all 356 lines). No merge-blocking concerns: the train command interpolates user-supplied values into a printed string only (no subprocess, no shell, no injection sink), the Path.home() joins are not traversal sinks, and no secrets are logged. The add_camera calls now correctly use name (resolving the prior review's camera_name MUST FIX), and the sim dispatch calls match the current simulation.py signatures. The PR is appropriately gated as draft/pending-GPU-validation, and the two open questions (run_policy collection path, exact lerobot.scripts.train flags) are already surfaced by the author and prior reviewer as [verify] items for the first hardware run — those are validation gaps, not code blockers.
What's good
- Scope discipline: examples-only, zero library mutations.
_must()wrapper surfaces error dicts from scene-setup dispatch instead of silently no-op'ing, matching AGENTS.md "no silent defaults on error."- No emojis in user-facing strings; plain ASCII throughout.
- Honest WIP framing with the unverified assumptions called out inline.
Draft — do not merge until GPU-validated.
Adds a runnable 3-phase example for the MolmoAct2 data-flywheel blog (collect → train → run), under
examples/lerobot/.What it does
allenai/MolmoAct2-SO100_101teacher in MuJoCo sim (two RGB cameras matching the checkpoint), records a LeRobot v3dataset with domain randomization, optional Hub push.
Robot()in sim (ormode="real").Validation status
Statically validated: compiles, every sim API call checked against the tool spec, inputs aligned with the MolmoAct2-SO100_101 model card (norm_tag, two cameras, bf16 on a 24GB L4). Not yet run on a GPU.
Two items to confirm on a g6.4xlarge (L4 24GB)
@cagataycali
lerobot.scripts.trainflags for the MolmoAct2 policy (authoritative ref: LeRobotdocs/source/molmoact2.mdx). The printed command uses the standard pattern; confirm flag names.run_policy(policy_object=...)while recording produces usable demonstrations (vs. the lower-levelget_actions→set_joint_positionsloop inmolmoact2_sim_pickplace.py).By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.