python inference/real_world/Motus/inference_example.py \
--model_config inference/real_world/Motus/utils/ac_one.yaml \
--ckpt_dir pretrained_models/Motus \
--wan_path pretrained_models \
--image examples/first_frame.png \
--instruction "Pour water from kettle to flowers" \
--use_t5 \
--output examples/output_ac_one.png
WARNING: failed to load checkpoint: Error(s) in loading state_dict for Motus:
size mismatch for action_expert.input_encoder.pos_embedding: copying a param with shape torch.Size([1, 8, 1024]) from checkpoint, the shape in current model is torch.Size([1, 54, 1024]).
size mismatch for action_module.action_expert.input_encoder.pos_embedding: copying a param with shape torch.Size([1, 8, 1024]) from checkpoint, the shape in current model is torch.Size([1, 54, 1024]).
Motus/inference/real_world/Motus/bak/wan/modules/attention.py", line 110, in flash_attention
deterministic=deterministic)[0].unflatten(0, (b, lq))
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/_tensor.py", line 1432, in unflatten
return super().unflatten(dim, sizes)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: unflatten: Provided sizes [1, 545] don't multiply up to the size of dim 0 (24) in the input tensor
The provided infer script:
gives error: